[D] Data curation and targeted replacement as a pre-training alignment and controllability method
Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...
Alignment, bias, regulation, and responsible AI
Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...
I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...
AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...
Abstract page for arXiv paper 2603.05375: Robust Node Affinities via Jaccard-Biased Random Walks and Rank Aggregation
Abstract page for arXiv paper 2603.05327: FairFinGAN: Fairness-aware Synthetic Financial Data Generation
Abstract page for arXiv paper 2603.05293: Knowledge Divergence and the Value of Debate for Scalable Oversight
Abstract page for arXiv paper 2603.05175: Incentive Aware AI Regulations: A Credal Characterisation
Abstract page for arXiv paper 2603.04851: Why Is RLHF Alignment Shallow? A Gradient Analysis
Abstract page for arXiv paper 2603.04831: Missingness Bias Calibration in Feature Attribution Explanations
Abstract page for arXiv paper 2603.04703: Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
Abstract page for arXiv paper 2603.04595: A Late-Fusion Multimodal AI Framework for Privacy-Preserving Deduplication in National Healthca...
Abstract page for arXiv paper 2602.09980: Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Info...
Abstract page for arXiv paper 2510.00177: PrefDisco: Benchmarking Proactive Personalized Reasoning
Abstract page for arXiv paper 2509.23886: Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Abstract page for arXiv paper 2508.06249: In-Training Defenses against Emergent Misalignment in Language Models
Abstract page for arXiv paper 2505.05589: ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance...
Abstract page for arXiv paper 2602.00485: Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models
Abstract page for arXiv paper 2511.21033: Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning
Abstract page for arXiv paper 2511.04439: CoRPO: Adding a Correctness Bias to GRPO Improves Generalization
Abstract page for arXiv paper 2603.05228: The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology
Abstract page for arXiv paper 2603.05149: Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding
Abstract page for arXiv paper 2603.04986: Debiasing Sequential Recommendation with Time-aware Inverse Propensity Scoring
Abstract page for arXiv paper 2603.04976: 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime