AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Washington needs AI guardrails — now | Opinion
Ai Safety

Washington needs AI guardrails — now | Opinion

We need legislation that draws clear lines on what AI systems may and may not do on behalf of the United States government

AI Tools & Products · 3 min ·
[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
Ai Safety

[2601.12910] SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Abstract page for arXiv paper 2601.12910: SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

arXiv - AI · 3 min ·
[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining
Machine Learning

[2509.21385] Debugging Concept Bottleneck Models through Removal and Retraining

Abstract page for arXiv paper 2509.21385: Debugging Concept Bottleneck Models through Removal and Retraining

arXiv - Machine Learning · 4 min ·

All Content

[2603.24849] Gaze patterns predict preference and confidence in pairwise AI image evaluation
Ai Safety

[2603.24849] Gaze patterns predict preference and confidence in pairwise AI image evaluation

Abstract page for arXiv paper 2603.24849: Gaze patterns predict preference and confidence in pairwise AI image evaluation

arXiv - AI · 3 min ·
[2603.24651] When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews
Llms

[2603.24651] When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Abstract page for arXiv paper 2603.24651: When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

arXiv - AI · 3 min ·
[2603.24634] Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization
Ai Safety

[2603.24634] Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

Abstract page for arXiv paper 2603.24634: Dual-Graph Multi-Agent Reinforcement Learning for Handover Optimization

arXiv - Machine Learning · 4 min ·
[2603.24618] Causal AI For AMS Circuit Design: Interpretable Parameter Effects Analysis
Machine Learning

[2603.24618] Causal AI For AMS Circuit Design: Interpretable Parameter Effects Analysis

Abstract page for arXiv paper 2603.24618: Causal AI For AMS Circuit Design: Interpretable Parameter Effects Analysis

arXiv - Machine Learning · 3 min ·
[2603.25062] SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning
Llms

[2603.25062] SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning

Abstract page for arXiv paper 2603.25062: SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Auto...

arXiv - Machine Learning · 3 min ·
[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
Llms

[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Abstract page for arXiv paper 2603.24596: X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

arXiv - AI · 3 min ·
[2603.24934] CVA: Context-aware Video-text Alignment for Video Temporal Grounding
Ai Safety

[2603.24934] CVA: Context-aware Video-text Alignment for Video Temporal Grounding

Abstract page for arXiv paper 2603.24934: CVA: Context-aware Video-text Alignment for Video Temporal Grounding

arXiv - Machine Learning · 4 min ·
[2603.25720] R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning
Machine Learning

[2603.25720] R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

Abstract page for arXiv paper 2603.25720: R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

arXiv - AI · 3 min ·
[2603.25412] Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
Llms

[2603.25412] Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

Abstract page for arXiv paper 2603.25412: Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

arXiv - AI · 4 min ·
[2603.24714] Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?
Ai Safety

[2603.24714] Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?

Abstract page for arXiv paper 2603.24714: Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?

arXiv - Machine Learning · 4 min ·
[2603.25046] MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting
Machine Learning

[2603.25046] MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting

Abstract page for arXiv paper 2603.25046: MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting

arXiv - Machine Learning · 4 min ·
[2603.25031] From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support
Llms

[2603.25031] From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support

Abstract page for arXiv paper 2603.25031: From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support

arXiv - AI · 4 min ·
[2603.25022] A Public Theory of Distillation Resistance via Constraint-Coupled Reasoning Architectures
Machine Learning

[2603.25022] A Public Theory of Distillation Resistance via Constraint-Coupled Reasoning Architectures

Abstract page for arXiv paper 2603.25022: A Public Theory of Distillation Resistance via Constraint-Coupled Reasoning Architectures

arXiv - Machine Learning · 3 min ·
[2603.24853] Resisting Humanization: Ethical Front-End Design Choices in AI for Sensitive Contexts
Machine Learning

[2603.24853] Resisting Humanization: Ethical Front-End Design Choices in AI for Sensitive Contexts

Abstract page for arXiv paper 2603.24853: Resisting Humanization: Ethical Front-End Design Choices in AI for Sensitive Contexts

arXiv - AI · 4 min ·
[2603.24768] Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design
Llms

[2603.24768] Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

Abstract page for arXiv paper 2603.24768: Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineeri...

arXiv - AI · 4 min ·
[2603.24742] Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour
Machine Learning

[2603.24742] Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

Abstract page for arXiv paper 2603.24742: Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

arXiv - Machine Learning · 4 min ·
[2603.24676] When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs
Llms

[2603.24676] When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

Abstract page for arXiv paper 2603.24676: When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

arXiv - AI · 4 min ·
Ai Safety

Need some AI agents

Hello Agenters, I need a few folks who have their AI agent running with some users to test my build. I've build an observability + monito...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Code hits $2.5B in revenue and ships auto mode, an AI classifier that decides what's safe to run on your machine

Anthropic dropped three features for Claude Code on Monday, but the interesting one is auto mode. Until now you had two choices: approve ...

Reddit - Artificial Intelligence · 1 min ·
[2603.18865] RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelity Radio Map Construction
Machine Learning

[2603.18865] RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelity Radio Map Construction

Abstract page for arXiv paper 2603.18865: RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelit...

arXiv - Machine Learning · 4 min ·
Previous Page 2 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime