AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

All Content

Llms

[2602.22638] MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

MobilityBench introduces a benchmark for evaluating LLM-based route-planning agents, addressing challenges in real-world mobility scenari...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22554] Multilingual Safety Alignment Via Sparse Weight Editing

This paper presents a novel framework for aligning safety measures in multilingual large language models (LLMs) through Sparse Weight Edi...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22585] Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach

This paper explores the integration of psychometric rater models into AI evaluation, aiming to correct human label biases and improve the...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.22557] CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety

CourtGuard introduces a model-agnostic framework for zero-shot policy adaptation in LLM safety, enhancing adaptability and performance wi...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.22546] Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention

This article presents a framework called AHCE for enhancing Large Language Model (LLM) agents through effective human collaboration, sign...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22520] TEFL: Prediction-Residual-Guided Rolling Forecasting for Multi-Horizon Time Series

The paper presents TEFL, a novel framework for multi-horizon time series forecasting that utilizes prediction residuals to enhance accura...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Agents

[2602.22519] A Mathematical Theory of Agency and Intelligence

This paper presents a mathematical framework for understanding agency and intelligence in AI systems, introducing the concept of bipredic...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22500] Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models

This article reviews the integration of AI into life cycle assessment (LCA), highlighting trends, themes, and future directions using lar...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22480] VeRO: An Evaluation Harness for Agents to Optimize Agents

The paper introduces VeRO, an evaluation harness designed for optimizing coding agents through structured evaluation and benchmarking, ad...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22470] Beyond performance-wise Contribution Evaluation in Federated Learning

This paper explores the limitations of current evaluation methods in federated learning, emphasizing the need for a multidimensional appr...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.22442] A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines

This article presents a framework for evaluating AI agent decisions in AutoML pipelines, emphasizing decision-centric metrics over tradit...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22438] From Bias to Balance: Fairness-Aware Paper Recommendation for Equitable Peer Review

This paper introduces Fair-PaperRec, a fairness-aware paper recommendation system designed to mitigate biases in peer review, enhancing e...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.22413] Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

This paper explores a probabilistic framework for collective decision-making among agents that can assess their own reliability and selec...

arXiv - AI · 3 min · about 1 month ago

Nlp

[2602.22302] Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents

The paper presents Agent Behavioral Contracts (ABC), a framework for specifying and enforcing the behavior of autonomous AI agents, addre...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22345] Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory

This paper explores the reliability and efficiency of large language models (LLMs) using Random Matrix Theory. It introduces EigenTrack f...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22303] Training Agents to Self-Report Misbehavior

The paper discusses a novel approach to training AI agents to self-report misbehavior, enhancing alignment and safety in AI systems by re...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22298] AviaSafe: A Physics-Informed Data-Driven Model for Aviation Safety-Critical Cloud Forecasts

AviaSafe introduces a physics-informed, data-driven model for aviation cloud forecasts, enhancing prediction accuracy for critical hydrom...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22291] Manifold of Failure: Behavioral Attraction Basins in Language Models

This paper introduces a framework for mapping the 'Manifold of Failure' in language models, identifying vulnerability regions and their t...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.22288] Reliable XAI Explanations in Sudden Cardiac Death Prediction for Chagas Cardiomyopathy

This article discusses a novel explainable AI (XAI) method for predicting sudden cardiac death in Chagas cardiomyopathy, emphasizing its ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22271] Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

This article presents a novel probabilistic framework for understanding causal self-attention in LLMs, introducing concepts like support ...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 40 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[D] I had an idea, would love your thoughts

I had an idea, would love your thoughts

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

All Content

[2602.22638] MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

[2602.22554] Multilingual Safety Alignment Via Sparse Weight Editing

[2602.22585] Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach

[2602.22557] CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety

[2602.22546] Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention

[2602.22520] TEFL: Prediction-Residual-Guided Rolling Forecasting for Multi-Horizon Time Series

[2602.22519] A Mathematical Theory of Agency and Intelligence

[2602.22500] Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models

[2602.22480] VeRO: An Evaluation Harness for Agents to Optimize Agents

[2602.22470] Beyond performance-wise Contribution Evaluation in Federated Learning

[2602.22442] A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines

[2602.22438] From Bias to Balance: Fairness-Aware Paper Recommendation for Equitable Peer Review

[2602.22413] Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

[2602.22302] Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents

[2602.22345] Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory

[2602.22303] Training Agents to Self-Report Misbehavior

[2602.22298] AviaSafe: A Physics-Informed Data-Driven Model for Aviation Safety-Critical Cloud Forecasts

[2602.22291] Manifold of Failure: Behavioral Attraction Basins in Language Models

[2602.22288] Reliable XAI Explanations in Sudden Cardiac Death Prediction for Chagas Cardiomyopathy

[2602.22271] Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

Related Topics

Stay updated with AI News