Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO written from scratch in PyTorch - updates! [P]

Reddit - Machine Learning 1 min read

About this article

So, yesterday run was a success and I did get an avg rollout length of about 64 tokens as attached in the image! This was with quality_reward + length_penalty (more info below!) Next, I'll be going with length penalty as the reward and with the mistake of counting characters as tokens fixed and see if there is any gaming the system stuff or degraded outputs! The rewards I used were 2: length_penalty : basically, -abs(response_length - MAX_LENGTH) quality_reward: ROUGE-L, which is basically LC...

You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket

Originally published on April 15, 2026. Curated by AI News.

Related Articles

Machine Learning

I built a small project to organize AI coding tools, looking for feedback on the structure and data model

Hi everyone, I’ve been learning by building a small web app that collects and organizes AI coding tools in one place. The idea is to make...

Reddit - Artificial Intelligence · 1 min ·
Top 10 AI certifications and courses for 2026
Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min ·
IIT Kharagpur Launches Online AI, ML & Tech Leadership Courses
Machine Learning

IIT Kharagpur Launches Online AI, ML & Tech Leadership Courses

Indian Institute of Technology Kharagpur has introduced a new set of online executive programmes focused on Artificial Intelligence, Mach...

AI News - General · 6 min ·
Myth: AI and machine learning will automatically fix your data problems
Machine Learning

Myth: AI and machine learning will automatically fix your data problems

The organisations seeing real value from AI aren’t skipping steps, they’re getting the fundamentals right first.

AI News - General · 3 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime