AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min · about 4 hours ago

Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min · about 8 hours ago

Llms

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

Abstract page for arXiv paper 2504.05995: NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

arXiv - AI · 4 min · about 8 hours ago

All Content

Machine Learning

[2602.14635] Alignment Adapter to Improve the Performance of Compressed Deep Learning Models

The paper introduces the Alignment Adapter (AlAd), a method to enhance the performance of compressed deep learning models by aligning the...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.14602] OPBench: A Graph Benchmark to Combat the Opioid Crisis

OPBench introduces a comprehensive benchmark for evaluating graph learning methods aimed at addressing the opioid crisis, featuring five ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13324] Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge

This paper presents a zero-shot framework for target verification and tactical reasoning in autonomous edge robotics, addressing challeng...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14553] Governing AI Forgetting: Auditing for Machine Unlearning Compliance

The paper discusses the challenges of ensuring compliance with data deletion requests in AI systems, proposing a novel economic framework...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13308] Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

This paper presents an explainable active learning framework for medical imaging that enhances data efficiency and interpretability by in...

arXiv - AI · 4 min · about 2 months ago

Computer Vision

[2602.13305] WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

WildfireVLM introduces an AI framework for early wildfire detection and risk assessment using satellite imagery, enhancing disaster manag...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2602.13304] Progressive Contrast Registration for High-Fidelity Bidirectional Photoacoustic Microscopy Alignment

This article presents PCReg-Net, a novel framework for high-fidelity alignment in bidirectional photoacoustic microscopy, significantly i...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14462] Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment

This paper explores 'silent inconsistency' in data-parallel fine-tuning of large language models, identifying optimization misalignments ...

arXiv - Machine Learning · 4 min · about 2 months ago

Generative Ai

[2602.13303] Spectral Collapse in Diffusion Inversion

The paper discusses 'spectral collapse' in diffusion inversion, highlighting failures in standard deterministic methods for image transla...

arXiv - Machine Learning · 3 min · about 2 months ago

Robotics

[2602.13291] Agent Mars: Multi-Agent Simulation for Multi-Planetary Life Exploration and Settlement

Agent Mars presents a multi-agent simulation framework designed for efficient coordination in Mars base operations, addressing challenges...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14444] Broken Chains: The Cost of Incomplete Reasoning in LLMs

The paper explores the impact of incomplete reasoning in large language models (LLMs), revealing how different reasoning modalities affec...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14430] A unified framework for evaluating the robustness of machine-learning interpretability for prospect risking

This article presents a unified framework for evaluating the robustness of machine-learning interpretability, specifically in the context...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.13286] Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification

This article explores Explanatory Interactive Machine Learning (XIL) as a method to mitigate bias in visual gender classification, demons...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2602.13284] Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook

This article presents a large-scale study of Moltbook, an AI-only social platform, revealing how AI agents create complex social structur...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.14351] WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control

The paper presents WIMLE, a model-based reinforcement learning method that enhances sample efficiency by addressing model errors and unce...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14322] Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study

This article explores the integration of Conformal Signal Temporal Logic (CSTL) in reinforcement learning (RL) for enhancing safety and r...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.13253] Implicit Bias in LLMs for Transgender Populations

This article examines implicit biases in large language models (LLMs) against transgender populations, highlighting disparities in health...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14318] In Transformer We Trust? A Perspective on Transformer Architecture Failure Modes

The paper examines the trustworthiness of transformer architectures in high-stakes applications, analyzing their reliability, interpretab...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.13246] Global AI Bias Audit for Technical Governance

This article discusses a global audit of Large Language Models (LLMs) focusing on geographic and socioeconomic biases in AI governance, h...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13244] Responsible AI in Business

The paper discusses the concept of Responsible AI in business, focusing on its implementation in small and medium-sized enterprises. It c...

arXiv - AI · 4 min · about 2 months ago

Previous Page 111 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[P] If you're building AI agents, logs aren't enough. You need evidence.

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

[2504.05995] NativQA Framework: Enabling LLMs and VLMs with Native, Local, and Everyday Knowledge

All Content

[2602.14635] Alignment Adapter to Improve the Performance of Compressed Deep Learning Models

[2602.14602] OPBench: A Graph Benchmark to Combat the Opioid Crisis

[2602.13324] Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge

[2602.14553] Governing AI Forgetting: Auditing for Machine Unlearning Compliance

[2602.13308] Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

[2602.13305] WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

[2602.13304] Progressive Contrast Registration for High-Fidelity Bidirectional Photoacoustic Microscopy Alignment

[2602.14462] Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment

[2602.13303] Spectral Collapse in Diffusion Inversion

[2602.13291] Agent Mars: Multi-Agent Simulation for Multi-Planetary Life Exploration and Settlement

[2602.14444] Broken Chains: The Cost of Incomplete Reasoning in LLMs

[2602.14430] A unified framework for evaluating the robustness of machine-learning interpretability for prospect risking

[2602.13286] Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification

[2602.13284] Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook

[2602.14351] WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control

[2602.14322] Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study

[2602.13253] Implicit Bias in LLMs for Transgender Populations

[2602.14318] In Transformer We Trust? A Perspective on Transformer Architecture Failure Modes

[2602.13246] Global AI Bias Audit for Technical Governance

[2602.13244] Responsible AI in Business

Related Topics

Stay updated with AI News