[D] Data curation and targeted replacement as a pre-training alignment and controllability method
About this article
Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before training, such as any instances of violence, lying, or deception in the dataset? Most controllability work, like RLHF or constitutional AI, seems to be done post-training. What I'm considering is intentionally training models on more carefully chosen data, and not letting it train on undesirable data at all. This is a literal application of Mo Gawdat's proposal ...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket