[D] I had an idea, would love your thoughts
About this article
What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like 5% or like 10% of its weights to reset and we inform the AI of this and we ask like a pannel of like 20 top human experts simultaneously chating with the bot to find misaligned behaviour, maybe another group of human experts with another way to find misalignment, and they do this periodically. Could this discourage misaligned behaviour. Just thought about it...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket