[D] Make. Big. Batch. Size.

Reddit - Machine Learning 1 min read

About this article

It's something between vent and learning. I tried training RWKV v6 model by my own code on my RTX 4050. I trained over 50k steps on batch_size=2 and gradient_accumulation=4 (effective_batch=2*4=8). It got up to 50 PPL (RWKV v6, ~192.8M model) and it just won't get less, I changed lr, time_decay lr (RWKV attention replacement) etc - but it got only worse or didn't changed anything at all.. and then... I just tried setting gradient_accumulation to 32. After one "epoch" (it's pseudo-epochs in my...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 02, 2026. Curated by AI News.

Related Articles

Microsoft takes on AI rivals with three new foundational models | TechCrunch
Machine Learning

Microsoft takes on AI rivals with three new foundational models | TechCrunch

MAI released models that can transcribe voice into text as well as generate audio and images after the group's formation six months ago.

TechCrunch - AI · 4 min ·
Machine Learning

AI Tools That Can’t Prove What They Did Will Hit a Wall

Most AI products are still judged like answer machines. People ask whether the model is smart, fast, creative, cheap, or good at sounding...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] PhAIL (phail.ai) – an open benchmark for robot AI on real hardware. Best model: 5% of human throughput, needs help every 4 minutes.

I spent the last year trying to answer a simple question: how good are VLA models on real commercial tasks? Not demos, not simulation, no...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] Trained a small BERT on 276K Kubernetes YAMLs using tree positional encoding instead of sequential

I trained a BERT-style transformer on 276K Kubernetes YAML files, replacing standard positional encoding with learned tree coordinates (d...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime