AI Safety & Ethics Guide
A comprehensive guide to the best ai safety & ethics resources, organized by type. Curated by AI News.
Researches
[2510.26722] Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off
This paper explores the challenges of heterogeneous federated learning in wireless networks, focusing on the bias-variance trade-off in non-convex scenarios. It presents a novel...
[2602.12426] Interference-Robust Non-Coherent Over-the-Air Computation for Decentralized Optimization
This paper presents an interference-robust non-coherent over-the-air computation (IR-NCOTA) method for decentralized optimization, enhancing consensus estimation in wireless net...
[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment
This research paper explores how emergently misaligned language models exhibit behavioral self-awareness, revealing shifts in their self-assessment after realignment training.
[2602.15568] Scenario Approach with Post-Design Certification of User-Specified Properties
This paper introduces a scenario approach for post-design certification of user-specified properties, enhancing reliability without additional test datasets.
[2602.20021] Agents of Chaos
The paper 'Agents of Chaos' presents findings from a red-teaming study on autonomous language-model-powered agents, highlighting security vulnerabilities and ethical concerns in...
[2602.16942] SourceBench: Can AI Answers Reference Quality Web Sources?
The paper introduces SourceBench, a benchmark designed to evaluate the quality of web sources cited by AI models across various query types, revealing insights for future AI and...
[2602.18029] Towards More Standardized AI Evaluation: From Models to Agents
This paper discusses the evolution of AI evaluation from static models to dynamic agents, emphasizing the need for standardized evaluation practices that foster trust and govern...
[2509.18949] Towards Privacy-Aware Bayesian Networks: A Credal Approach
This paper presents a novel approach to privacy-aware Bayesian networks using credal networks, addressing the trade-off between privacy and model utility in probabilistic graphi...
[2602.18536] Triggering hallucinations in model-based MRI reconstruction via adversarial perturbations
This paper investigates how adversarial perturbations can induce hallucinations in generative models used for MRI reconstruction, highlighting potential risks in medical imaging.
[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts
This study explores how large language models (LLMs) exhibit inconsistent biases towards algorithmic agents and human experts in decision-making tasks, revealing significant imp...
Articles
Anthropic AI safety researcher quits with 'world in peril'
An Anthropic AI safety researcher has resigned, citing concerns over the potential dangers of AI technologies, emphasizing the urgent need for safety measures.
[2602.15438] Logit Distance Bounds Representational Similarity
This paper explores the relationship between logit distance and representational similarity in discriminative models, demonstrating that closeness in logit distance ensures line...
[2602.15852] Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints
This article discusses the development of clinical NLP models that mitigate risks associated with temporal leakage, emphasizing the importance of safety and calibration in predi...
[2602.16444] RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation
RoboGene introduces a framework for automating the generation of diverse, physically plausible robotic manipulation tasks, addressing the challenges of data scarcity in robotics.
Show HN: 3LC – Illuminate the ML Black Box
3LC is an open-source tool designed to enhance the interpretability of machine learning models, addressing the 'black box' issue by providing insights into model decisions.
[2410.03952] Pixel-Based Similarities as an Alternative to Neural Data for Improving Convolutional Neural Network Adversarial Robustness
This paper presents a novel approach to enhancing the adversarial robustness of Convolutional Neural Networks (CNNs) by utilizing pixel-based similarities instead of neural data...
The left is missing out on AI | Transformer News
The article discusses how the political left has largely overlooked the implications of artificial intelligence, despite its societal significance, with few exceptions.
Ai ?
The Reddit discussion explores concerns about AI potentially replacing jobs in the future, prompting varied opinions on the impact of AI on employment.
[2509.19852] Eliminating stability hallucinations in llm-based tts models via attention guidance
This paper addresses stability hallucinations in LLM-based TTS models by enhancing attention mechanisms, proposing a new alignment metric, and demonstrating effective results in...
Beyond the bot: learning how to learn with AI
Professor Brandi Row Lazzarini's courses at Willamette University teach students to effectively and responsibly use AI, enhancing their self-awareness and critical thinking thro...
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime