AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.18277] PRISM: Parallel Reward Integration with Symmetry for MORL
Ai Safety

[2602.18277] PRISM: Parallel Reward Integration with Symmetry for MORL

The paper presents PRISM, a novel algorithm for Multi-Objective Reinforcement Learning (MORL) that addresses the challenges of heterogene...

arXiv - Machine Learning · 4 min ·
[2602.17753] The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
Ai Agents

[2602.17753] The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

The 2025 AI Agent Index presents a comprehensive overview of 30 deployed agentic AI systems, detailing their technical and safety feature...

arXiv - AI · 3 min ·
[2602.17729] Stop Saying "AI"
Ai Safety

[2602.17729] Stop Saying "AI"

The paper argues for a more precise use of the term 'AI' in discussions, particularly in military contexts, to enhance clarity and unders...

arXiv - AI · 4 min ·
[2602.17720] "Everyone's using it, but no one is allowed to talk about it": College Students' Experiences Navigating the Higher Education Environment in a Generative AI World
Generative Ai

[2602.17720] "Everyone's using it, but no one is allowed to talk about it": College Students' Experiences Navigating the Higher Education Environment in a Generative AI World

This article explores college students' experiences with generative AI in higher education, highlighting the pressures and social dynamic...

arXiv - AI · 4 min ·
[2602.18182] Capabilities Ain't All You Need: Measuring Propensities in AI
Machine Learning

[2602.18182] Capabilities Ain't All You Need: Measuring Propensities in AI

The paper introduces a framework for measuring AI propensities, emphasizing the importance of behavioral tendencies alongside capabilitie...

arXiv - Machine Learning · 4 min ·
[2602.18160] Unifying Formal Explanations: A Complexity-Theoretic Perspective
Machine Learning

[2602.18160] Unifying Formal Explanations: A Complexity-Theoretic Perspective

This paper presents a unified framework for understanding formal explanations in machine learning, focusing on the computational complexi...

arXiv - Machine Learning · 4 min ·
[2602.18131] Learning Long-Range Dependencies with Temporal Predictive Coding
Machine Learning

[2602.18131] Learning Long-Range Dependencies with Temporal Predictive Coding

The paper presents a novel method combining Temporal Predictive Coding with Real-Time Recurrent Learning to effectively learn long-range ...

arXiv - Machine Learning · 3 min ·
[2602.17672] Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse
Llms

[2602.17672] Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse

This article evaluates the effectiveness of large language models (LLMs) in providing support for survivors of technology-facilitated abu...

arXiv - AI · 4 min ·
[2602.18037] Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
Llms

[2602.18037] Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards

This paper presents a novel approach to prevent reward hacking in reinforcement learning by using gradient regularization, enhancing the ...

arXiv - Machine Learning · 4 min ·
[2602.17671] AI Hallucination from Students' Perspective: A Thematic Analysis
Llms

[2602.17671] AI Hallucination from Students' Perspective: A Thematic Analysis

This study analyzes university students' experiences with AI hallucinations, revealing detection strategies and misconceptions about thei...

arXiv - AI · 4 min ·
[2602.07152] Trojans in Artificial Intelligence (TrojAI) Final Report
Machine Learning

[2602.07152] Trojans in Artificial Intelligence (TrojAI) Final Report

The Trojans in Artificial Intelligence (TrojAI) Final Report outlines the findings of a multi-year initiative aimed at addressing vulnera...

arXiv - Machine Learning · 4 min ·
[2602.18201] SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps
Machine Learning

[2602.18201] SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps

The paper explores the limitations of unsupervised learning methods, specifically Self-Organizing Maps (SOMs), in maintaining fairness by...

arXiv - Machine Learning · 4 min ·
[2602.17910] Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems
Machine Learning

[2602.17910] Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems

This paper presents APEMO, a novel runtime scheduling layer designed to enhance the reliability of long-horizon agentic systems by optimi...

arXiv - AI · 3 min ·
[2602.17975] Generating adversarial inputs for a graph neural network model of AC power flow
Machine Learning

[2602.17975] Generating adversarial inputs for a graph neural network model of AC power flow

This paper presents a method for generating adversarial inputs for a graph neural network model used in AC power flow analysis, demonstra...

arXiv - Machine Learning · 3 min ·
[2602.17676] Epistemic Traps: Rational Misalignment Driven by Model Misspecification
Llms

[2602.17676] Epistemic Traps: Rational Misalignment Driven by Model Misspecification

This paper explores how model misspecification leads to rational misalignments in AI behavior, presenting a new framework for understandi...

arXiv - Machine Learning · 4 min ·
[2602.17948] A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion
Machine Learning

[2602.17948] A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion

This paper explores the accuracy-robustness trade-off in deep learning through a geometric lens, utilizing Symmetry-Breaking Dimensional ...

arXiv - Machine Learning · 4 min ·
[2602.17947] Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition
Nlp

[2602.17947] Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition

This article explores the generalization of bilevel programming in hyperparameter optimization, focusing on bias-variance decomposition t...

arXiv - Machine Learning · 4 min ·
[2602.17918] Distribution-Free Sequential Prediction with Abstentions
Machine Learning

[2602.17918] Distribution-Free Sequential Prediction with Abstentions

This paper explores a distribution-free approach to sequential prediction with abstentions, proposing an algorithm called AbstainBoost th...

arXiv - Machine Learning · 4 min ·
[2602.17861] JAX-Privacy: A library for differentially private machine learning
Machine Learning

[2602.17861] JAX-Privacy: A library for differentially private machine learning

JAX-Privacy is a new library aimed at simplifying the implementation of differentially private machine learning, offering both customizat...

arXiv - Machine Learning · 3 min ·
[2602.17853] Neural Prior Estimation: Learning Class Priors from Latent Representations
Machine Learning

[2602.17853] Neural Prior Estimation: Learning Class Priors from Latent Representations

The paper introduces Neural Prior Estimator (NPE), a framework for learning class priors from latent representations, addressing class im...

arXiv - Machine Learning · 3 min ·
Previous Page 73 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime