What should happen when you feed impossible moves into a chess-playing language model? [D]
About this article
I'd appreciate some input on an experiment I've been mulling over. You can treat it as straight-up interpretability, but it would have theoretical implications. Karvonen (2024) trained a 50M-parameter transformer on chess game transcripts. Just character prediction, no rules, no board representation. It learned to play at ~1500 Elo and developed internal board state representations that linear probes can read. He published the model, the probes, and the intervention tools (https://github.com/...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket