AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min ·
[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis
Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min ·
[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge
Llms

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

Abstract page for arXiv paper 2504.05995: NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

arXiv - AI · 4 min ·

All Content

[2602.13555] Privacy-Concealing Cooperative Perception for BEV Scene Segmentation
Computer Vision

[2602.13555] Privacy-Concealing Cooperative Perception for BEV Scene Segmentation

The paper presents a Privacy-Concealing Cooperation (PCC) framework for Bird's Eye View (BEV) semantic segmentation, enhancing autonomous...

arXiv - AI · 4 min ·
[2602.13547] AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks
Llms

[2602.13547] AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

The paper presents AISA, a novel defense mechanism for large language models (LLMs) that enhances safety against jailbreak attacks by act...

arXiv - AI · 4 min ·
[2602.15001] Boundary Point Jailbreaking of Black-Box LLMs
Llms

[2602.15001] Boundary Point Jailbreaking of Black-Box LLMs

The paper introduces Boundary Point Jailbreaking (BPJ), a novel automated attack method that circumvents advanced safeguards in black-box...

arXiv - Machine Learning · 4 min ·
[2602.13540] On Calibration of Large Language Models: From Response To Capability
Llms

[2602.13540] On Calibration of Large Language Models: From Response To Capability

This paper introduces the concept of capability calibration for large language models (LLMs), emphasizing the importance of accurate conf...

arXiv - Machine Learning · 4 min ·
[2602.13504] From Perceptions To Evidence: Detecting AI-Generated Content In Turkish News Media With A Fine-Tuned Bert Classifier
Llms

[2602.13504] From Perceptions To Evidence: Detecting AI-Generated Content In Turkish News Media With A Fine-Tuned Bert Classifier

This study presents a fine-tuned BERT classifier for detecting AI-generated content in Turkish news media, achieving a high F1 score and ...

arXiv - AI · 4 min ·
[2602.14889] Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment
Machine Learning

[2602.14889] Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment

The paper presents a framework for web-scale multimodal summarization that integrates text and image data using CLIP-based semantic align...

arXiv - Machine Learning · 3 min ·
[2602.13455] Using Machine Learning to Enhance the Detection of Obfuscated Abusive Words in Swahili: A Focus on Child Safety
Machine Learning

[2602.13455] Using Machine Learning to Enhance the Detection of Obfuscated Abusive Words in Swahili: A Focus on Child Safety

This article explores the use of machine learning to detect obfuscated abusive language in Swahili, focusing on child safety and the chal...

arXiv - AI · 4 min ·
[2602.13458] MoltNet: Understanding Social Behavior of AI Agents in the Agent-Native MoltBook
Ai Agents

[2602.13458] MoltNet: Understanding Social Behavior of AI Agents in the Agent-Native MoltBook

MoltNet explores the social behavior of AI agents on the MoltBook platform, revealing insights into their interactions and similarities t...

arXiv - AI · 4 min ·
[2602.14849] Atomix: Timely, Transactional Tool Use for Reliable Agentic Workflows
Llms

[2602.14849] Atomix: Timely, Transactional Tool Use for Reliable Agentic Workflows

The paper presents Atomix, a runtime system designed to enhance the reliability of agentic workflows by implementing progress-aware trans...

arXiv - AI · 3 min ·
[2602.13427] Backdooring Bias in Large Language Models
Llms

[2602.13427] Backdooring Bias in Large Language Models

The paper explores backdoor attacks in large language models (LLMs), focusing on how biases can be induced through syntactically and sema...

arXiv - AI · 4 min ·
[2602.14844] Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment
Ai Safety

[2602.14844] Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment

This paper introduces Interactionless Inverse Reinforcement Learning, a framework aimed at improving AI alignment by decoupling safety ob...

arXiv - Machine Learning · 3 min ·
[2602.13421] Metabolic cost of information processing in Poisson variational autoencoders
Machine Learning

[2602.13421] Metabolic cost of information processing in Poisson variational autoencoders

This article explores the metabolic cost of information processing in Poisson variational autoencoders, emphasizing the energy constraint...

arXiv - AI · 4 min ·
[2602.13379] Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents
Llms

[2602.13379] Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents

This article presents a new benchmark, MT-AgentRisk, for evaluating safety risks in multi-turn interactions of tool-using agents, reveali...

arXiv - Machine Learning · 4 min ·
[2602.13370] G2CP: A Graph-Grounded Communication Protocol for Verifiable and Efficient Multi-Agent Reasoning
Llms

[2602.13370] G2CP: A Graph-Grounded Communication Protocol for Verifiable and Efficient Multi-Agent Reasoning

The paper presents G2CP, a novel communication protocol for multi-agent systems that enhances efficiency and verifiability by using graph...

arXiv - AI · 3 min ·
[2602.13363] Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents
Llms

[2602.13363] Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents

This article evaluates the capabilities of large language models (LLMs) in generating spear-phishing websites, highlighting the potential...

arXiv - AI · 4 min ·
[2602.13357] AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers
Machine Learning

[2602.13357] AdaCorrection: Adaptive Offset Cache Correction for Accurate Diffusion Transformers

The paper introduces AdaCorrection, a framework that enhances the efficiency of Diffusion Transformers by correcting cache misalignment, ...

arXiv - AI · 3 min ·
[2602.14729] Scale redundancy and soft gauge fixing in positively homogeneous neural networks
Machine Learning

[2602.14729] Scale redundancy and soft gauge fixing in positively homogeneous neural networks

This paper explores the concept of scale redundancy in positively homogeneous neural networks, introducing gauge-adapted coordinates and ...

arXiv - AI · 3 min ·
[2602.14701] Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation
Machine Learning

[2602.14701] Unbiased Approximate Vector-Jacobian Products for Efficient Backpropagation

This paper presents methods to enhance the efficiency of backpropagation in deep learning by using unbiased approximate vector-Jacobian p...

arXiv - Machine Learning · 3 min ·
[2602.14682] Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error
Machine Learning

[2602.14682] Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

This paper investigates the diversity bias in deep generative models, revealing that these models often underestimate the diversity of th...

arXiv - AI · 4 min ·
[2602.13339] An Integrated Causal Inference Framework for Traffic Safety Modeling with Semantic Street-View Visual Features
Machine Learning

[2602.13339] An Integrated Causal Inference Framework for Traffic Safety Modeling with Semantic Street-View Visual Features

This article presents a novel causal inference framework for traffic safety modeling, utilizing semantic features from street-view images...

arXiv - AI · 4 min ·
Previous Page 110 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime