[P] Inferencing Llama3.2-1B-Instruct on 3xMac Minis M4 with Data Parallelism using allToall architecture! | smolcluster
About this article
Here's another sneak-peek into inference of Llama3.2-1B-Instruct model, on 3xMac Mini 16 gigs each M4 with smolcluster! Today's the demo for my Data Parallelism implementation using allToall architecture, all written from scratch using only socket libraries for communications. Data parallelism allows for data to be shared across many gpus but each gpu will have the full model on them. It's used when you have data not fitting on a single gpu. I went for a allToall architecture where each worke...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket