AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·
Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min ·

All Content

[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems
Ai Safety

[2603.20833] Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

Abstract page for arXiv paper 2603.20833: Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

arXiv - AI · 3 min ·
[2603.20578] Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems
Llms

[2603.20578] Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems

Abstract page for arXiv paper 2603.20578: Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model S...

arXiv - AI · 4 min ·
[2603.20425] Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making
Machine Learning

[2603.20425] Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Policy Decision-Making in Data-Scarce Making

Abstract page for arXiv paper 2603.20425: Leveraging Natural Language Processing and Machine Learning for Evidence-Based Food Security Po...

arXiv - AI · 4 min ·
Machine Learning

LightRest Ltd's 'LAGK' Initiative - Leverage-Aware Governance Kernal

Most discussions around AI safety focus on what models know or whether outputs are correct. But since 2019, I’ve been working on somethin...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

I had an AI psychosis episode, got a Bipolar diagnosis, used AI to beat 20-year OCD, then built an AI governance platform. The actual story.

May 2025. I went too deep into AI, too fast. What happened was a 2-week psychiatric hospitalization and a Bipolar diagnosis. AI psychosis...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails (arXiv 2603.18280)

Paper: https://arxiv.org/abs/2603.18280 TL;DR: Current alignment evaluation measures concept detection (probing) and refusal (benchmarkin...

Reddit - Machine Learning · 1 min ·
Ai Safety

UK cops suspend live facial recog as study finds racial bias

submitted by /u/ateam1984 [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
[2510.15520] Discovering Intersectional Bias via Directional Alignment in Face Recognition Embeddings
Machine Learning

[2510.15520] Discovering Intersectional Bias via Directional Alignment in Face Recognition Embeddings

Abstract page for arXiv paper 2510.15520: Discovering Intersectional Bias via Directional Alignment in Face Recognition Embeddings

arXiv - Machine Learning · 4 min ·
[2507.17343] Principled Multimodal Representation Learning
Ai Safety

[2507.17343] Principled Multimodal Representation Learning

Abstract page for arXiv paper 2507.17343: Principled Multimodal Representation Learning

arXiv - Machine Learning · 4 min ·
[2503.03773] A Phylogenetic Approach to Genomic Language Modeling
Llms

[2503.03773] A Phylogenetic Approach to Genomic Language Modeling

Abstract page for arXiv paper 2503.03773: A Phylogenetic Approach to Genomic Language Modeling

arXiv - Machine Learning · 3 min ·
[2603.20024] Layered Quantum Architecture Search for 3D Point Cloud Classification
Ai Safety

[2603.20024] Layered Quantum Architecture Search for 3D Point Cloud Classification

Abstract page for arXiv paper 2603.20024: Layered Quantum Architecture Search for 3D Point Cloud Classification

arXiv - Machine Learning · 3 min ·
[2603.19907] Infinite-dimensional spherical-radial decomposition for probabilistic functions, with application to constrained optimal control and Gaussian process regression
Ai Safety

[2603.19907] Infinite-dimensional spherical-radial decomposition for probabilistic functions, with application to constrained optimal control and Gaussian process regression

Abstract page for arXiv paper 2603.19907: Infinite-dimensional spherical-radial decomposition for probabilistic functions, with applicati...

arXiv - Machine Learning · 3 min ·
[2603.19862] IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment
Llms

[2603.19862] IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

Abstract page for arXiv paper 2603.19862: IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment

arXiv - Machine Learning · 4 min ·
[2603.19386] TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis
Ai Safety

[2603.19386] TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis

Abstract page for arXiv paper 2603.19386: TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis

arXiv - Machine Learning · 4 min ·
[2603.20115] Conditioning Protein Generation via Hopfield Pattern Multiplicity
Machine Learning

[2603.20115] Conditioning Protein Generation via Hopfield Pattern Multiplicity

Abstract page for arXiv paper 2603.20115: Conditioning Protein Generation via Hopfield Pattern Multiplicity

arXiv - Machine Learning · 4 min ·
[2603.19741] FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment
Llms

[2603.19741] FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

Abstract page for arXiv paper 2603.19741: FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

arXiv - Machine Learning · 4 min ·
[2603.19486] Any-Subgroup Equivariant Networks via Symmetry Breaking
Llms

[2603.19486] Any-Subgroup Equivariant Networks via Symmetry Breaking

Abstract page for arXiv paper 2603.19486: Any-Subgroup Equivariant Networks via Symmetry Breaking

arXiv - Machine Learning · 4 min ·
[2603.19299] PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling
Machine Learning

[2603.19299] PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling

Abstract page for arXiv paper 2603.19299: PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovas...

arXiv - Machine Learning · 4 min ·
[2603.18829] Agent Control Protocol: Admission Control for Agent Actions
Robotics

[2603.18829] Agent Control Protocol: Admission Control for Agent Actions

Abstract page for arXiv paper 2603.18829: Agent Control Protocol: Admission Control for Agent Actions

arXiv - AI · 4 min ·
[2601.03273] A Multi-Perspective Benchmark and Moderation Model for Evaluating Safety and Adversarial Robustness
Llms

[2601.03273] A Multi-Perspective Benchmark and Moderation Model for Evaluating Safety and Adversarial Robustness

Abstract page for arXiv paper 2601.03273: A Multi-Perspective Benchmark and Moderation Model for Evaluating Safety and Adversarial Robust...

arXiv - Machine Learning · 4 min ·
Previous Page 11 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime