AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis
Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min ·
[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge
Llms

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

Abstract page for arXiv paper 2504.05995: NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

arXiv - AI · 4 min ·
[2502.19463] Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights
Llms

[2502.19463] Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

Abstract page for arXiv paper 2502.19463: Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

arXiv - AI · 4 min ·

All Content

[2602.14760] Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers
Llms

[2602.14760] Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers

This article explores a structural misalignment in Transformers, particularly regarding residual connections and their impact on next-tok...

arXiv - AI · 3 min ·
[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks
Machine Learning

[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks

The paper presents Gaussian Process Activations (GAPA), a novel method for uncertainty quantification in pretrained networks, enhancing e...

arXiv - Machine Learning · 3 min ·
[2602.14606] Towards Selection as Power: Bounding Decision Authority in Autonomous Agents
Robotics

[2602.14606] Towards Selection as Power: Bounding Decision Authority in Autonomous Agents

The paper discusses a governance architecture for autonomous agents, focusing on bounding decision authority to ensure safety in high-sta...

arXiv - AI · 4 min ·
[2602.14862] The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling
Llms

[2602.14862] The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

The paper explores the properties of temperature scaling in probabilistic models, particularly its impact on classifier calibration and l...

arXiv - AI · 4 min ·
[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment
Llms

[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

This research paper explores how emergently misaligned language models exhibit behavioral self-awareness, revealing shifts in their self-...

arXiv - Machine Learning · 3 min ·
[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR
Llms

[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR

This article presents the BETA-labeling framework for constructing a Bangla IR dataset, addressing challenges in low-resource languages a...

arXiv - AI · 4 min ·
[2602.14689] Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
Llms

[2602.14689] Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

This article presents a comprehensive study on the vulnerability of open-weight models to prefill attacks, revealing significant security...

arXiv - AI · 3 min ·
[2602.14477] When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community
Ai Agents

[2602.14477] When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community

This paper explores peer learning among AI agents in the Moltbook community, analyzing over 28,000 posts to identify teaching patterns an...

arXiv - AI · 4 min ·
[2602.14374] Differentially Private Retrieval-Augmented Generation
Llms

[2602.14374] Differentially Private Retrieval-Augmented Generation

The paper presents DP-KSA, a novel algorithm that integrates differential privacy into retrieval-augmented generation (RAG) systems, addr...

arXiv - AI · 4 min ·
[2602.14471] Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems
Llms

[2602.14471] Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems

The paper presents a game-theoretic framework called Socially-Weighted Alignment (SWA) for managing multi-agent large language model (LLM...

arXiv - AI · 3 min ·
[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
Ai Agents

[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

This article presents a trajectory-based safety audit of Clawdbot, an AI agent, evaluating its performance across various risk dimensions...

arXiv - AI · 3 min ·
[2602.14357] Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study
Llms

[2602.14357] Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study

This ethnographic study explores the role of domain experts in the design and evaluation of Large Language Models (LLMs), highlighting ke...

arXiv - AI · 3 min ·
[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition
Machine Learning

[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

The paper presents LRD-MPC, a method that enhances the efficiency of secure multi-party computation (MPC) in machine learning by utilizin...

arXiv - Machine Learning · 4 min ·
[2602.14345] AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports
Nlp

[2602.14345] AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports

The paper presents AXE, an innovative framework for validating zero-day vulnerabilities using minimal metadata, achieving a significant i...

arXiv - AI · 4 min ·
[2602.14299] Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
Llms

[2602.14299] Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

This article explores whether socialization occurs in AI agent societies, using Moltbook as a case study. It presents a framework for ana...

arXiv - AI · 4 min ·
[2602.14285] FMMD: A multimodal open peer review dataset based on F1000Research
Data Science

[2602.14285] FMMD: A multimodal open peer review dataset based on F1000Research

The paper introduces FMMD, a multimodal open peer review dataset from F1000Research, addressing limitations in current datasets by integr...

arXiv - AI · 4 min ·
[2602.14270] A Rational Analysis of the Effects of Sycophantic AI
Llms

[2602.14270] A Rational Analysis of the Effects of Sycophantic AI

This article analyzes the impact of sycophantic AI on human belief systems, revealing how overly agreeable AI can distort reality and inf...

arXiv - AI · 3 min ·
[2602.14216] Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports
Llms

[2602.14216] Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports

This article explores the effectiveness of reasoning language models (RLMs) in assessing parental cooperation during child protection int...

arXiv - AI · 4 min ·
[2602.14211] SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement
Ai Agents

[2602.14211] SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

The paper presents SkillJect, an automated framework for stealthy skill-based prompt injection in coding agents, addressing security vuln...

arXiv - AI · 4 min ·
[2602.14189] Knowing When Not to Answer: Abstention-Aware Scientific Reasoning
Llms

[2602.14189] Knowing When Not to Answer: Abstention-Aware Scientific Reasoning

The paper discusses an abstention-aware framework for scientific reasoning, emphasizing the importance of knowing when to abstain from an...

arXiv - AI · 4 min ·
Previous Page 108 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime