AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min · about 6 hours ago

Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min · about 9 hours ago

Llms

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

Abstract page for arXiv paper 2504.05995: NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

arXiv - AI · 4 min · about 9 hours ago

All Content

Machine Learning

[2602.13241] Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned

This article discusses the design and deployment of a GenAI-powered training system for 9-1-1 call-takers, highlighting the challenges fa...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14275] Reverse N-Wise Output-Oriented Testing for AI/ML and Quantum Computing Systems

The paper introduces Reverse N-Wise Output-Oriented Testing, a novel approach for testing AI/ML and quantum computing systems, addressing...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13231] An Explainable Failure Prediction Framework for Neural Networks in Radio Access Networks

This paper presents a framework for explainable failure prediction in neural networks used in radio access networks, enhancing model tran...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.13222] Computability of Agentic Systems

This paper presents the Quest Graph framework for analyzing agentic systems' capabilities, establishing a computational hierarchy and eff...

arXiv - AI · 3 min · about 2 months ago

Ai Infrastructure

[2602.13207] A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

This article presents a safety-constrained reinforcement learning framework aimed at enhancing the reliability of wireless autonomy, part...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14233] Evaluating LLMs in Finance Requires Explicit Bias Consideration

This paper discusses the need for explicit bias consideration in evaluating Large Language Models (LLMs) used in finance, identifying fiv...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.10833] Training-Induced Bias Toward LLM-Generated Content in Dense Retrieval

This study investigates the training-induced bias towards LLM-generated content in dense retrieval systems, revealing how dataset and tra...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Agents

[2602.14994] On the Semantics of Primary Cause in Hybrid Dynamic Domains

This paper presents two definitions of primary cause within a hybrid action-theoretic framework, addressing the complexities of causation...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14161] When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

This paper evaluates the effectiveness of malicious prompt classifiers under true distribution shifts, revealing significant performance ...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.14869] Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution

The paper introduces Concept Influence, a method to enhance training data attribution by leveraging interpretability, improving performan...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14857] World Models for Policy Refinement in StarCraft II

The paper presents StarWM, a novel world model for refining decision-making policies in StarCraft II using large language models, demonst...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14740] AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises

The paper explores how advanced AI models exhibit complex reasoning in simulated nuclear crises, revealing insights into strategic decisi...

arXiv - AI · 4 min · about 2 months ago

Robotics

[2602.14691] Removing Planner Bias in Goal Recognition Through Multi-Plan Dataset Generation

This paper presents a method to eliminate planner bias in goal recognition using multi-plan dataset generation, enhancing the evaluation ...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.14676] GREAT-EER: Graph Edge Attention Network for Emergency Evacuation Responses

The paper presents GREAT-EER, a Graph Edge Attention Network designed to optimize emergency evacuation responses by solving the Bus Evacu...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Agents

[2602.14674] From User Preferences to Base Score Extraction Functions in Gradual Argumentation

This paper introduces Base Score Extraction Functions in gradual argumentation, enhancing decision-making and AI transparency by mapping ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14643] Arbor: A Framework for Reliable Navigation of Critical Conversation Flows

The paper presents Arbor, a framework designed to enhance the navigation of critical conversation flows in high-stakes environments like ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14529] Disentangling Deception and Hallucination Failures in LLMs

This paper explores the distinction between deception and hallucination failures in large language models (LLMs), proposing a mechanism-o...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14518] Diagnosing Knowledge Conflict in Multimodal Long-Chain Reasoning

This paper explores knowledge conflicts in multimodal large language models (MLLMs) during long chain-of-thought reasoning, proposing a f...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.14505] Formally Verifying and Explaining Sepsis Treatment Policies with COOL-MC

This paper presents COOL-MC, a tool for verifying and explaining sepsis treatment policies using reinforcement learning, enhancing decisi...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.13934] Why Code, Why Now: Learnability, Computability, and the Real Limits of Machine Learning

The paper discusses the learnability and computability limits of machine learning, emphasizing the structured feedback of code generation...

arXiv - Machine Learning · 3 min · about 2 months ago

Previous Page 112 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[P] If you're building AI agents, logs aren't enough. You need evidence.

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

All Content

[2602.13241] Real-World Design and Deployment of an Embedded GenAI-powered 9-1-1 Calltaking Training System: Experiences and Lessons Learned

[2602.14275] Reverse N-Wise Output-Oriented Testing for AI/ML and Quantum Computing Systems

[2602.13231] An Explainable Failure Prediction Framework for Neural Networks in Radio Access Networks

[2602.13222] Computability of Agentic Systems

[2602.13207] A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

[2602.14233] Evaluating LLMs in Finance Requires Explicit Bias Consideration

[2602.10833] Training-Induced Bias Toward LLM-Generated Content in Dense Retrieval

[2602.14994] On the Semantics of Primary Cause in Hybrid Dynamic Domains

[2602.14161] When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

[2602.14869] Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution

[2602.14857] World Models for Policy Refinement in StarCraft II

[2602.14740] AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises

[2602.14691] Removing Planner Bias in Goal Recognition Through Multi-Plan Dataset Generation

[2602.14676] GREAT-EER: Graph Edge Attention Network for Emergency Evacuation Responses

[2602.14674] From User Preferences to Base Score Extraction Functions in Gradual Argumentation

[2602.14643] Arbor: A Framework for Reliable Navigation of Critical Conversation Flows

[2602.14529] Disentangling Deception and Hallucination Failures in LLMs

[2602.14518] Diagnosing Knowledge Conflict in Multimodal Long-Chain Reasoning

[2602.14505] Formally Verifying and Explaining Sepsis Treatment Policies with COOL-MC

[2602.13934] Why Code, Why Now: Learnability, Computability, and the Real Limits of Machine Learning

Related Topics

Stay updated with AI News