Dynamic batching for Encoder-Decoder MT training or generation when long sequence caps the batch size [P]

Reddit - Machine Learning April 28, 2026 1 min read

About this article

I built a small pytorch sampler called dynabatch after facing this specific batching issue while fine tuning a NLLB-200 600M model. Training on RTX 5090, the largest fixed batch size I could use was 8, any bigger leads to OOM. While training and monitoring using nvidia-smi , it looked like only a few batches were actually stressing the GPU. A lot of the time utilization was much lower. My guess was that fixed batch size was being dictated by the longests source/target examples, while the shor...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 28, 2026. Curated by AI News.

Read Original Article

Llms

The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]

TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models tr...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Fast experiment on T4 GPU. Self play training on Dark Hex (Colab notebook) [P]

Last week I run a fun experiment on Dark Hex. Here's a visualization of two iterations (1800 vs 1900) of agent playing agains each other ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Google signs deal with Pentagon, allowing 'any lawful' use of AI models

https://preview.redd.it/hbbp7hn1cxxg1.png?width=811&format=png&auto=webp&s=a633fe43837bf60e014afaa4c6cf3fe72a4976d3 I feel li...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Karpathy dropped a 200-line GPT, so I used the math to turn pandas DataFrames into searchable context windows and open sourced it (and automated my stats pipeline). [P]

TL;DR: I got tired of manually running Shapiro-Wilk tests and copy-pasting p-values at 2 AM. I built an open-source, async Python pipelin...

Reddit - Machine Learning · 1 min · about 4 hours ago

More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime