AI Safety & Ethics Guide

A comprehensive guide to the best ai safety & ethics resources, organized by type. Curated by AI News.

This Week This Month Guide Trending

Researches

[2510.26722] Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off

This paper explores the challenges of heterogeneous federated learning in wireless networks, focusing on the bias-variance trade-off in non-convex scenarios. It presents a novel...

arXiv - Machine Learning

[2602.12426] Interference-Robust Non-Coherent Over-the-Air Computation for Decentralized Optimization

This paper presents an interference-robust non-coherent over-the-air computation (IR-NCOTA) method for decentralized optimization, enhancing consensus estimation in wireless net...

arXiv - Machine Learning

[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

This research paper explores how emergently misaligned language models exhibit behavioral self-awareness, revealing shifts in their self-assessment after realignment training.

arXiv - Machine Learning

[2602.15568] Scenario Approach with Post-Design Certification of User-Specified Properties

This paper introduces a scenario approach for post-design certification of user-specified properties, enhancing reliability without additional test datasets.

arXiv - Machine Learning

[2602.20021] Agents of Chaos

The paper 'Agents of Chaos' presents findings from a red-teaming study on autonomous language-model-powered agents, highlighting security vulnerabilities and ethical concerns in...

arXiv - AI

[2602.16942] SourceBench: Can AI Answers Reference Quality Web Sources?

The paper introduces SourceBench, a benchmark designed to evaluate the quality of web sources cited by AI models across various query types, revealing insights for future AI and...

arXiv - AI

[2602.18029] Towards More Standardized AI Evaluation: From Models to Agents

This paper discusses the evolution of AI evaluation from static models to dynamic agents, emphasizing the need for standardized evaluation practices that foster trust and govern...

arXiv - AI

[2509.18949] Towards Privacy-Aware Bayesian Networks: A Credal Approach

This paper presents a novel approach to privacy-aware Bayesian networks using credal networks, addressing the trade-off between privacy and model utility in probabilistic graphi...

arXiv - AI

[2602.18536] Triggering hallucinations in model-based MRI reconstruction via adversarial perturbations

This paper investigates how adversarial perturbations can induce hallucinations in generative models used for MRI reconstruction, highlighting potential risks in medical imaging.

arXiv - Machine Learning

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

This study explores how large language models (LLMs) exhibit inconsistent biases towards algorithmic agents and human experts in decision-making tasks, revealing significant imp...

arXiv - AI

Articles

Anthropic AI safety researcher quits with 'world in peril'

An Anthropic AI safety researcher has resigned, citing concerns over the potential dangers of AI technologies, emphasizing the urgent need for safety measures.

Reddit - Artificial Intelligence

[2602.15438] Logit Distance Bounds Representational Similarity

This paper explores the relationship between logit distance and representational similarity in discriminative models, demonstrating that closeness in logit distance ensures line...

arXiv - AI

[2602.15852] Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints

This article discusses the development of clinical NLP models that mitigate risks associated with temporal leakage, emphasizing the importance of safety and calibration in predi...

arXiv - AI

[2602.16444] RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

RoboGene introduces a framework for automating the generation of diverse, physically plausible robotic manipulation tasks, addressing the challenges of data scarcity in robotics.

arXiv - AI

Show HN: 3LC – Illuminate the ML Black Box

3LC is an open-source tool designed to enhance the interpretability of machine learning models, addressing the 'black box' issue by providing insights into model decisions.

Hacker News - AI

[2410.03952] Pixel-Based Similarities as an Alternative to Neural Data for Improving Convolutional Neural Network Adversarial Robustness

This paper presents a novel approach to enhancing the adversarial robustness of Convolutional Neural Networks (CNNs) by utilizing pixel-based similarities instead of neural data...

arXiv - Machine Learning

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics Guide

Researches

[2510.26722] Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off

[2602.12426] Interference-Robust Non-Coherent Over-the-Air Computation for Decentralized Optimization

[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment

[2602.15568] Scenario Approach with Post-Design Certification of User-Specified Properties

[2602.20021] Agents of Chaos

[2602.16942] SourceBench: Can AI Answers Reference Quality Web Sources?

[2602.18029] Towards More Standardized AI Evaluation: From Models to Agents

[2509.18949] Towards Privacy-Aware Bayesian Networks: A Credal Approach

[2602.18536] Triggering hallucinations in model-based MRI reconstruction via adversarial perturbations

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

Articles

Anthropic AI safety researcher quits with 'world in peril'

[2602.15438] Logit Distance Bounds Representational Similarity

[2602.15852] Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints

[2602.16444] RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

Show HN: 3LC – Illuminate the ML Black Box

[2410.03952] Pixel-Based Similarities as an Alternative to Neural Data for Improving Convolutional Neural Network Adversarial Robustness

The left is missing out on AI | Transformer News

Ai ?

[2509.19852] Eliminating stability hallucinations in llm-based tts models via attention guidance

Beyond the bot: learning how to learn with AI

Stay updated with AI News