AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis
Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min ·
[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge
Llms

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

Abstract page for arXiv paper 2504.05995: NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

arXiv - AI · 4 min ·
[2502.19463] Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights
Llms

[2502.19463] Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

Abstract page for arXiv paper 2502.19463: Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

arXiv - AI · 4 min ·

All Content

[2602.14030] MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages
Llms

[2602.14030] MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages

MC$^2$Mark introduces a novel watermarking framework that ensures reliable embedding of long messages in generated text while maintaining...

arXiv - Machine Learning · 3 min ·
[2602.14158] A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing
Llms

[2602.14158] A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing

This article presents a multi-agent framework for medical AI that enhances clinical query processing by leveraging fine-tuned language mo...

arXiv - AI · 4 min ·
[2602.14106] Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models
Llms

[2602.14106] Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

This paper explores the integration of Large Language Models (LLMs) in anticipating adversary behavior within DevSecOps environments, pro...

arXiv - AI · 4 min ·
[2602.14080] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality
Llms

[2602.14080] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

The paper explores the limitations of factuality evaluations in large language models (LLMs), identifying recall as a key bottleneck in a...

arXiv - AI · 4 min ·
[2602.13864] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation
Machine Learning

[2602.13864] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation

This paper presents a novel approach to activation functions in neural networks that incorporates missing data and confidence scores, enh...

arXiv - Machine Learning · 4 min ·
[2602.14012] From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection
Llms

[2602.14012] From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection

This article explores the post-training pipeline for LLM-based vulnerability detection, detailing methods from supervised fine-tuning (SF...

arXiv - AI · 4 min ·
[2602.13672] LEAD-Drift: Real-time and Explainable Intent Drift Detection by Learning a Data-Driven Risk Score
Machine Learning

[2602.13672] LEAD-Drift: Real-time and Explainable Intent Drift Detection by Learning a Data-Driven Risk Score

The LEAD-Drift framework offers a real-time solution for detecting intent drift in Intent-Based Networking (IBN), enhancing proactive net...

arXiv - Machine Learning · 4 min ·
[2602.13619] Locally Private Parametric Methods for Change-Point Detection
Ai Startups

[2602.13619] Locally Private Parametric Methods for Change-Point Detection

This paper presents novel locally private parametric methods for change-point detection, focusing on maintaining privacy while identifyin...

arXiv - Machine Learning · 3 min ·
[2602.13914] Common Knowledge Always, Forever
Machine Learning

[2602.13914] Common Knowledge Always, Forever

The paper discusses a polytopological PDL framework for expressing common knowledge and its implications in epistemic logic, highlighting...

arXiv - AI · 3 min ·
[2602.13891] GSRM: Generative Speech Reward Model for Speech RLHF
Llms

[2602.13891] GSRM: Generative Speech Reward Model for Speech RLHF

The paper introduces the Generative Speech Reward Model (GSRM), a novel approach to evaluating speech naturalness in AI-generated audio, ...

arXiv - AI · 4 min ·
[2602.13784] Comparables XAI: Faithful Example-based AI Explanations with Counterfactual Trace Adjustments
Ai Startups

[2602.13784] Comparables XAI: Faithful Example-based AI Explanations with Counterfactual Trace Adjustments

The paper introduces Comparables XAI, a method for providing faithful, example-based AI explanations using counterfactual trace adjustmen...

arXiv - AI · 3 min ·
[2602.13675] Transferable XAI: Relating Understanding Across Domains with Explanation Transfer
Ai Safety

[2602.13675] Transferable XAI: Relating Understanding Across Domains with Explanation Transfer

The paper presents Transferable XAI, a framework that enables users to apply understanding from one AI domain to another, enhancing decis...

arXiv - AI · 4 min ·
[2602.13268] Expected Moral Shortfall for Ethical Competence in Decision-making Models
Machine Learning

[2602.13268] Expected Moral Shortfall for Ethical Competence in Decision-making Models

This paper explores the integration of moral cognition into AI decision-making models, introducing the concept of Expected Moral Shortfal...

arXiv - Machine Learning · 3 min ·
[2602.13238] Securing SIM-Assisted Wireless Networks via Quantum Reinforcement Learning
Robotics

[2602.13238] Securing SIM-Assisted Wireless Networks via Quantum Reinforcement Learning

This paper presents a novel hybrid quantum reinforcement learning framework, Q-PPO, designed to enhance the security of SIM-assisted wire...

arXiv - Machine Learning · 4 min ·
[2602.13625] Anthropomorphism on Risk Perception: The Role of Trust and Domain Knowledge in Decision-Support AI
Machine Learning

[2602.13625] Anthropomorphism on Risk Perception: The Role of Trust and Domain Knowledge in Decision-Support AI

This article explores how anthropomorphism in AI influences risk perception through trust and domain knowledge, based on a large-scale on...

arXiv - AI · 3 min ·
[2602.13576] Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges
Llms

[2602.13576] Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

The paper identifies a vulnerability in large language model (LLM) evaluation processes, termed Rubric-Induced Preference Drift (RIPD), w...

arXiv - AI · 4 min ·
[2602.13575] Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment
Llms

[2602.13575] Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment

The paper introduces Elo-Evolve, a co-evolutionary framework for aligning large language models (LLMs) through dynamic multi-agent compet...

arXiv - AI · 3 min ·
[2602.15028] Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization
Llms

[2602.15028] Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization

The paper examines how increasing context length in large language models (LLMs) affects personalization quality and privacy risks, revea...

arXiv - AI · 4 min ·
[2602.13562] Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning
Llms

[2602.13562] Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning

The paper presents the Adaptive Safe Context Learning (ASCL) framework to address the safety-utility trade-off in large language model (L...

arXiv - AI · 3 min ·
[2602.13555] Privacy-Concealing Cooperative Perception for BEV Scene Segmentation
Computer Vision

[2602.13555] Privacy-Concealing Cooperative Perception for BEV Scene Segmentation

The paper presents a Privacy-Concealing Cooperation (PCC) framework for Bird's Eye View (BEV) semantic segmentation, enhancing autonomous...

arXiv - AI · 4 min ·
Previous Page 109 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime