AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·
Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min ·

All Content

[2509.20057] Responsible AI Technical Report
Ai Safety

[2509.20057] Responsible AI Technical Report

Abstract page for arXiv paper 2509.20057: Responsible AI Technical Report

arXiv - AI · 4 min ·
[2603.18987] Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis
Machine Learning

[2603.18987] Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis

Abstract page for arXiv paper 2603.18987: Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-...

arXiv - AI · 4 min ·
[2509.19464] Evaluation-Aware Reinforcement Learning
Ai Safety

[2509.19464] Evaluation-Aware Reinforcement Learning

Abstract page for arXiv paper 2509.19464: Evaluation-Aware Reinforcement Learning

arXiv - Machine Learning · 3 min ·
[2502.14400] HPS: Hard Preference Sampling for Human Preference Alignment
Llms

[2502.14400] HPS: Hard Preference Sampling for Human Preference Alignment

Abstract page for arXiv paper 2502.14400: HPS: Hard Preference Sampling for Human Preference Alignment

arXiv - AI · 4 min ·
[2603.20192] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
Machine Learning

[2603.20192] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

Abstract page for arXiv paper 2603.20192: LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

arXiv - AI · 4 min ·
[2603.20122] Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models
Llms

[2603.20122] Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

Abstract page for arXiv paper 2603.20122: Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

arXiv - AI · 4 min ·
[2603.20116] Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning
Machine Learning

[2603.20116] Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

Abstract page for arXiv paper 2603.20116: Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

arXiv - AI · 3 min ·
[2603.20094] LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain
Llms

[2603.20094] LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain

Abstract page for arXiv paper 2603.20094: LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace ...

arXiv - AI · 4 min ·
[2603.20103] Spectral Alignment in Forward-Backward Representations via Temporal Abstraction
Ai Safety

[2603.20103] Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

Abstract page for arXiv paper 2603.20103: Spectral Alignment in Forward-Backward Representations via Temporal Abstraction

arXiv - Machine Learning · 3 min ·
[2603.19979] X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving
Machine Learning

[2603.19979] X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

Abstract page for arXiv paper 2603.19979: X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

arXiv - AI · 4 min ·
[2603.19957] HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction
Llms

[2603.19957] HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

Abstract page for arXiv paper 2603.19957: HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

arXiv - Machine Learning · 3 min ·
[2603.19807] Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision
Machine Learning

[2603.19807] Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision

Abstract page for arXiv paper 2603.19807: Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision

arXiv - AI · 3 min ·
[2603.19667] Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding
Ai Safety

[2603.19667] Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding

Abstract page for arXiv paper 2603.19667: Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Moda...

arXiv - AI · 3 min ·
[2603.19615] CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation
Llms

[2603.19615] CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation

Abstract page for arXiv paper 2603.19615: CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation

arXiv - AI · 3 min ·
[2603.19609] LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment
Machine Learning

[2603.19609] LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment

Abstract page for arXiv paper 2603.19609: LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment

arXiv - AI · 3 min ·
[2603.19563] Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search
Machine Learning

[2603.19563] Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture Search

Abstract page for arXiv paper 2603.19563: Dual-Domain Representation Alignment: Bridging 2D and 3D Vision via Geometry-Aware Architecture...

arXiv - AI · 4 min ·
[2603.19510] Linear Social Choice with Few Queries: A Moment-Based Approach
Machine Learning

[2603.19510] Linear Social Choice with Few Queries: A Moment-Based Approach

Abstract page for arXiv paper 2603.19510: Linear Social Choice with Few Queries: A Moment-Based Approach

arXiv - AI · 4 min ·
[2603.19423] The Autonomy Tax: Defense Training Breaks LLM Agents
Llms

[2603.19423] The Autonomy Tax: Defense Training Breaks LLM Agents

Abstract page for arXiv paper 2603.19423: The Autonomy Tax: Defense Training Breaks LLM Agents

arXiv - Machine Learning · 4 min ·
[2603.19335] Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions
Machine Learning

[2603.19335] Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions

Abstract page for arXiv paper 2603.19335: Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Sc...

arXiv - AI · 4 min ·
[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
Machine Learning

[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

Abstract page for arXiv paper 2603.19308: GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

arXiv - AI · 3 min ·
Previous Page 12 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime