AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min · 8 minutes ago

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 8 hours ago

Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min · about 14 hours ago

All Content

Ai Safety

[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

Abstract page for arXiv paper 2603.20833: Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

arXiv - AI · 3 min · 6 days ago

Llms

[2603.20578] Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems

Abstract page for arXiv paper 2603.20578: Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model S...

arXiv - AI · 4 min · 6 days ago

Machine Learning

[2603.20425] Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making

Abstract page for arXiv paper 2603.20425: Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Po...

arXiv - AI · 4 min · 6 days ago

Machine Learning

LightRest Ltd's 'LAGK' Initiative - Leverage-Aware Governance Kernal

Most discussions around AI safety focus on what models know or whether outputs are correct. But since 2019, I’ve been working on somethin...

Reddit - Artificial Intelligence · 1 min · 6 days ago

Ai Safety

I had an AI psychosis episode, got a Bipolar diagnosis, used AI to beat 20-year OCD, then built an AI governance platform. The actual story.

May 2025. I went too deep into AI, too fast. What happened was a 2-week psychiatric hospitalization and a Bipolar diagnosis. AI psychosis...

Reddit - Artificial Intelligence · 1 min · 6 days ago

Llms

[R] Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails (arXiv 2603.18280)

Paper: https://arxiv.org/abs/2603.18280 TL;DR: Current alignment evaluation measures concept detection (probing) and refusal (benchmarkin...

Reddit - Machine Learning · 1 min · 6 days ago

Ai Safety

UK cops suspend live facial recog as study finds racial bias

submitted by /u/ateam1984 [link] [comments]

Reddit - Artificial Intelligence · 1 min · 6 days ago

Machine Learning

[2510.15520] Discovering Intersectional Bias via Directional Alignment in Face Recognition Embeddings

Abstract page for arXiv paper 2510.15520: Discovering Intersectional Bias via Directional Alignment in Face Recognition Embeddings

arXiv - Machine Learning · 4 min · 6 days ago

Ai Safety

[2507.17343] Principled Multimodal Representation Learning

Abstract page for arXiv paper 2507.17343: Principled Multimodal Representation Learning

arXiv - Machine Learning · 4 min · 6 days ago

Llms

[2503.03773] A Phylogenetic Approach to Genomic Language Modeling

Abstract page for arXiv paper 2503.03773: A Phylogenetic Approach to Genomic Language Modeling

arXiv - Machine Learning · 3 min · 6 days ago

Ai Safety

[2603.20024] Layered Quantum Architecture Search for 3D Point Cloud Classification

Abstract page for arXiv paper 2603.20024: Layered Quantum Architecture Search for 3D Point Cloud Classification

arXiv - Machine Learning · 3 min · 6 days ago

Ai Safety

[2603.19907] Infinite-dimensional spherical-radial decomposition for probabilistic functions, with application to constrained optimal control and Gaussian process regression

Abstract page for arXiv paper 2603.19907: Infinite-dimensional spherical-radial decomposition for probabilistic functions, with applicati...

arXiv - Machine Learning · 3 min · 6 days ago

Llms

[2603.19862] IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

Abstract page for arXiv paper 2603.19862: IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

arXiv - Machine Learning · 4 min · 6 days ago

Ai Safety

[2603.19386] TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis

Abstract page for arXiv paper 2603.19386: TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis

arXiv - Machine Learning · 4 min · 6 days ago

Machine Learning

[2603.20115] Conditioning Protein Generation via Hopfield Pattern Multiplicity

Abstract page for arXiv paper 2603.20115: Conditioning Protein Generation via Hopfield Pattern Multiplicity

arXiv - Machine Learning · 4 min · 6 days ago

Llms

[2603.19741] FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

Abstract page for arXiv paper 2603.19741: FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

arXiv - Machine Learning · 4 min · 6 days ago

Llms

[2603.19486] Any-Subgroup Equivariant Networks via Symmetry Breaking

Abstract page for arXiv paper 2603.19486: Any-Subgroup Equivariant Networks via Symmetry Breaking

arXiv - Machine Learning · 4 min · 6 days ago

Machine Learning

[2603.19299] PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling

Abstract page for arXiv paper 2603.19299: PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovas...

arXiv - Machine Learning · 4 min · 6 days ago

Robotics

[2603.18829] Agent Control Protocol: Admission Control for Agent Actions

Abstract page for arXiv paper 2603.18829: Agent Control Protocol: Admission Control for Agent Actions

arXiv - AI · 4 min · 6 days ago

Llms

[2601.03273] A Multi-Perspective Benchmark and Moderation Model for Evaluating Safety and Adversarial Robustness

Abstract page for arXiv paper 2601.03273: A Multi-Perspective Benchmark and Moderation Model for Evaluating Safety and Adversarial Robust...

arXiv - Machine Learning · 4 min · 6 days ago

Previous Page 11 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

Bias in AI: Examples and 6 Ways to Fix it in 2026

[R] I built a benchmark that catches LLMs breaking physics laws

All Content

[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

[2603.20578] Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems

[2603.20425] Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making

LightRest Ltd's 'LAGK' Initiative - Leverage-Aware Governance Kernal

I had an AI psychosis episode, got a Bipolar diagnosis, used AI to beat 20-year OCD, then built an AI governance platform. The actual story.

[R] Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails (arXiv 2603.18280)

UK cops suspend live facial recog as study finds racial bias

[2510.15520] Discovering Intersectional Bias via Directional Alignment in Face Recognition Embeddings

[2507.17343] Principled Multimodal Representation Learning

[2503.03773] A Phylogenetic Approach to Genomic Language Modeling

[2603.20024] Layered Quantum Architecture Search for 3D Point Cloud Classification

[2603.19907] Infinite-dimensional spherical-radial decomposition for probabilistic functions, with application to constrained optimal control and Gaussian process regression

[2603.19862] IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

[2603.19386] TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis

[2603.20115] Conditioning Protein Generation via Hopfield Pattern Multiplicity

[2603.19741] FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

[2603.19486] Any-Subgroup Equivariant Networks via Symmetry Breaking

[2603.19299] PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling

[2603.18829] Agent Control Protocol: Admission Control for Agent Actions

[2601.03273] A Multi-Perspective Benchmark and Moderation Model for Evaluating Safety and Adversarial Robustness

Related Topics

Stay updated with AI News