Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Hugging Face Blog 21 min read

About this article

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Back to Articles Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Published August 8, 2025 Update on GitHub Upvote 92 +86 Salman Mohammadi smohammadi Follow axolotl-ai-co Matej Sirovatka siro1 Follow wing lian winglian Follow axolotl-ai-co Marc Sun marcsun13 Follow Dan Saunders djsaunde Follow axolotl-ai-co Training large models across multiple GPUs can be challenging due to the complexities of different parallelism strategies. In Accelerate, together with Axolotl, we have integrated a quick and easy way to use any combination of parallelism strategies in your training script! Here is how to add it to your training script: from transformers import AutoModelForCausalLM from accelerate import Accelerator from accelerate.parallelism_config import ParallelismConfig from accelerate.utils import FullyShardedDataParallelPlugin # configure your desired parallelisms here - this particular configuration requires at least 2 nodes with 8 GPUs each. # setting any parallelism degree to 1 disables it i.e. dp_replicate_size=1 disables DP. pc = ParallelismConfig( dp_shard_size=2, # Fully Sharded Data Parallel degree dp_replicate_size=2, # Data Parallel degree cp_size=2, # Context Parallel degree tp_size=2, # Tensor Parallel degree ) fsdp_plugin = FullyShardedDataParallelPlugin( fsdp_version=2, auto_wrap_policy="transformer_based_wrap", transformer_cls_names_to_wrap=["LlamaDecoderLayer"], state_dict_type="SHARDED_STATE_DICT", ) accelerator = Accelerator( parallelism_config=pc...

Originally published on February 15, 2026. Curated by AI News.

Related Articles

Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min ·
[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence
Llms

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

Abstract page for arXiv paper 2603.16430: EngGPT2: Sovereign, Efficient and Open Intelligence

arXiv - AI · 4 min ·
[2512.12812] Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, and LLaMA
Llms

[2512.12812] Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, and LLaMA

Abstract page for arXiv paper 2512.12812: Does Tone Change the Answer? Evaluating Prompt Politeness Effects on Modern LLMs: GPT, Gemini, ...

arXiv - AI · 4 min ·
More in Open Source Ai: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime