AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 9 hours ago

Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min · about 16 hours ago

All Content

Ai Safety

[2509.20057] Responsible AI Technical Report

Abstract page for arXiv paper 2509.20057: Responsible AI Technical Report

arXiv - AI · 4 min · 6 days ago

Machine Learning

[2603.18987] Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis

Abstract page for arXiv paper 2603.18987: Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-...

arXiv - AI · 4 min · 6 days ago

Ai Safety

[2509.19464] Evaluation-Aware Reinforcement Learning

Abstract page for arXiv paper 2509.19464: Evaluation-Aware Reinforcement Learning

arXiv - Machine Learning · 3 min · 6 days ago

Llms

[2502.14400] HPS: Hard Preference Sampling for Human Preference Alignment

Abstract page for arXiv paper 2502.14400: HPS: Hard Preference Sampling for Human Preference Alignment

arXiv - AI · 4 min · 6 days ago

Machine Learning

[2603.20192] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

Abstract page for arXiv paper 2603.20192: LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

arXiv - AI · 4 min · 6 days ago

Llms

[2603.20122] Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Abstract page for arXiv paper 2603.20122: Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

arXiv - AI · 4 min · 6 days ago

Machine Learning

[2603.20116] Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

Abstract page for arXiv paper 2603.20116: Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

arXiv - AI · 3 min · 6 days ago

Llms

[2603.20094] LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain

Abstract page for arXiv paper 2603.20094: LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace ...

arXiv - AI · 4 min · 6 days ago

Ai Safety

[2603.20103] Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

Abstract page for arXiv paper 2603.20103: Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

arXiv - Machine Learning · 3 min · 6 days ago

Machine Learning

[2603.19979] X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

Abstract page for arXiv paper 2603.19979: X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

arXiv - AI · 4 min · 6 days ago

Llms

[2603.19957] HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

Abstract page for arXiv paper 2603.19957: HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

arXiv - Machine Learning · 3 min · 6 days ago

Machine Learning

[2603.19807] Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision

Abstract page for arXiv paper 2603.19807: Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision

arXiv - AI · 3 min · 6 days ago

Ai Safety

[2603.19667] Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding

Abstract page for arXiv paper 2603.19667: Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Moda...

arXiv - AI · 3 min · 6 days ago

Llms

[2603.19615] CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation

Abstract page for arXiv paper 2603.19615: CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation

arXiv - AI · 3 min · 6 days ago

Machine Learning

[2603.19609] LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment

Abstract page for arXiv paper 2603.19609: LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment

arXiv - AI · 3 min · 6 days ago

Machine Learning

[2603.19563] Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search

Abstract page for arXiv paper 2603.19563: Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture...

arXiv - AI · 4 min · 6 days ago

Machine Learning

[2603.19510] Linear Social Choice with Few Queries: A Moment-Based Approach

Abstract page for arXiv paper 2603.19510: Linear Social Choice with Few Queries: A Moment-Based Approach

arXiv - AI · 4 min · 6 days ago

Llms

[2603.19423] The Autonomy Tax: Defense Training Breaks LLM Agents

Abstract page for arXiv paper 2603.19423: The Autonomy Tax: Defense Training Breaks LLM Agents

arXiv - Machine Learning · 4 min · 6 days ago

Machine Learning

[2603.19335] Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions

Abstract page for arXiv paper 2603.19335: Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Sc...

arXiv - AI · 4 min · 6 days ago

Machine Learning

[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

Abstract page for arXiv paper 2603.19308: GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

arXiv - AI · 3 min · 6 days ago

Previous Page 12 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

Bias in AI: Examples and 6 Ways to Fix it in 2026

[R] I built a benchmark that catches LLMs breaking physics laws

All Content

[2509.20057] Responsible AI Technical Report

[2603.18987] Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis

[2509.19464] Evaluation-Aware Reinforcement Learning

[2502.14400] HPS: Hard Preference Sampling for Human Preference Alignment

[2603.20192] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

[2603.20122] Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

[2603.20116] Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

[2603.20094] LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain

[2603.20103] Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

[2603.19979] X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

[2603.19957] HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

[2603.19807] Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision

[2603.19667] Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding

[2603.19615] CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation

[2603.19609] LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment

[2603.19563] Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search

[2603.19510] Linear Social Choice with Few Queries: A Moment-Based Approach

[2603.19423] The Autonomy Tax: Defense Training Breaks LLM Agents

[2603.19335] Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions

[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

Related Topics

Stay updated with AI News