AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

[R] I trained a 3k parameter model on XOR sequences of length 20. It extrapolates perfectly to length 1,000,000. Here's why I think that's architecturally significant.

I've been working on an alternative to attention-based sequence modeling that I'm calling Geometric Flow Networks (GFN). The core idea: i...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min ·
Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.04340] Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study
Machine Learning

[2603.04340] Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study

Abstract page for arXiv paper 2603.04340: Balancing Fidelity, Utility, and Privacy in Synthetic Cardiac MRI Generation: A Comparative Study

arXiv - Machine Learning · 3 min ·
[2603.04113] Understanding Sources of Demographic Predictability in Brain MRI via Disentangling Anatomy and Contrast
Ai Safety

[2603.04113] Understanding Sources of Demographic Predictability in Brain MRI via Disentangling Anatomy and Contrast

Abstract page for arXiv paper 2603.04113: Understanding Sources of Demographic Predictability in Brain MRI via Disentangling Anatomy and ...

arXiv - AI · 4 min ·
[2603.04069] Monitoring Emergent Reward Hacking During Generation via Internal Activations
Llms

[2603.04069] Monitoring Emergent Reward Hacking During Generation via Internal Activations

Abstract page for arXiv paper 2603.04069: Monitoring Emergent Reward Hacking During Generation via Internal Activations

arXiv - AI · 4 min ·
[2603.03785] Observationally Informed Adaptive Causal Experimental Design
Machine Learning

[2603.03785] Observationally Informed Adaptive Causal Experimental Design

Abstract page for arXiv paper 2603.03785: Observationally Informed Adaptive Causal Experimental Design

arXiv - Machine Learning · 4 min ·
[2603.03989] When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models
Machine Learning

[2603.03989] When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models

Abstract page for arXiv paper 2603.03989: When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models

arXiv - AI · 4 min ·
[2603.03971] Upholding Epistemic Agency: A Brouwerian Assertibility Constraint for Responsible AI
Generative Ai

[2603.03971] Upholding Epistemic Agency: A Brouwerian Assertibility Constraint for Responsible AI

Abstract page for arXiv paper 2603.03971: Upholding Epistemic Agency: A Brouwerian Assertibility Constraint for Responsible AI

arXiv - AI · 4 min ·
[2603.03405] Surprisal-Rényi Free Energy
Machine Learning

[2603.03405] Surprisal-Rényi Free Energy

Abstract page for arXiv paper 2603.03405: Surprisal-Rényi Free Energy

arXiv - Machine Learning · 3 min ·
[2603.03401] Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents
Ai Safety

[2603.03401] Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

Abstract page for arXiv paper 2603.03401: Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

arXiv - Machine Learning · 3 min ·
[2603.03915] Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects
Llms

[2603.03915] Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects

Abstract page for arXiv paper 2603.03915: Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personalit...

arXiv - AI · 3 min ·
[2603.03769] IntroductionDMD-augmented Unpaired Neural Schrödinger Bridge for Ultra-Low Field MRI Enhancement
Ai Safety

[2603.03769] IntroductionDMD-augmented Unpaired Neural Schrödinger Bridge for Ultra-Low Field MRI Enhancement

Abstract page for arXiv paper 2603.03769: IntroductionDMD-augmented Unpaired Neural Schrödinger Bridge for Ultra-Low Field MRI Enhancement

arXiv - Machine Learning · 3 min ·
[2603.04359] Dissecting Quantization Error: A Concentration-Alignment Perspective
Machine Learning

[2603.04359] Dissecting Quantization Error: A Concentration-Alignment Perspective

Abstract page for arXiv paper 2603.04359: Dissecting Quantization Error: A Concentration-Alignment Perspective

arXiv - AI · 3 min ·
[2603.03714] Order Is Not Layout: Order-to-Space Bias in Image Generation
Machine Learning

[2603.03714] Order Is Not Layout: Order-to-Space Bias in Image Generation

Abstract page for arXiv paper 2603.03714: Order Is Not Layout: Order-to-Space Bias in Image Generation

arXiv - AI · 3 min ·
[2603.04135] Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization
Llms

[2603.04135] Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Abstract page for arXiv paper 2603.04135: Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

arXiv - AI · 4 min ·
[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems
Llms

[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

Abstract page for arXiv paper 2603.03536: SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

arXiv - AI · 3 min ·
[2603.03920] BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning
Machine Learning

[2603.03920] BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

Abstract page for arXiv paper 2603.03920: BD-Merging: Bias-Aware Dynamic Model Merging with Evidence-Guided Contrastive Learning

arXiv - AI · 4 min ·
[2603.03515] The Controllability Trap: A Governance Framework for Military AI Agents
Machine Learning

[2603.03515] The Controllability Trap: A Governance Framework for Military AI Agents

Abstract page for arXiv paper 2603.03515: The Controllability Trap: A Governance Framework for Military AI Agents

arXiv - AI · 3 min ·
[2603.03867] k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods
Ai Safety

[2603.03867] k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

Abstract page for arXiv paper 2603.03867: k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods

arXiv - Machine Learning · 4 min ·
[2603.03820] Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation
Ai Safety

[2603.03820] Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation

Abstract page for arXiv paper 2603.03820: Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learnin...

arXiv - AI · 4 min ·
[2603.03662] Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling
Machine Learning

[2603.03662] Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

Abstract page for arXiv paper 2603.03662: Graph Negative Feedback Bias Correction Framework for Adaptive Heterophily Modeling

arXiv - AI · 4 min ·
[2603.03341] Ethical and Explainable AI in Reusable MLOps Pipelines
Machine Learning

[2603.03341] Ethical and Explainable AI in Reusable MLOps Pipelines

Abstract page for arXiv paper 2603.03341: Ethical and Explainable AI in Reusable MLOps Pipelines

arXiv - AI · 4 min ·
Previous Page 17 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime