The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]
About this article
TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models trained with my functions by ~59.9%, but I'm just one guy with one GPU. Hoping someone with more resources can prove me right or wrong. The functions: Per-token gain: Each token's loss gets scaled by how surprising it is. Confident-correct tokens coast, surprising ones get amplified, and the average comes out unchanged so total gradient budget is preserved. Pe...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket