Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO written from scratch in PyTorch - updates! [P]
So, yesterday run was a success and I did get an avg rollout length of about 64 tokens as attached in the image! This was with quality_re...