AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · 1 day ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · 1 day ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · 1 day ago

All Content

Ai Safety

[2602.18277] PRISM: Parallel Reward Integration with Symmetry for MORL

The paper presents PRISM, a novel algorithm for Multi-Objective Reinforcement Learning (MORL) that addresses the challenges of heterogene...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Agents

[2602.17753] The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

The 2025 AI Agent Index presents a comprehensive overview of 30 deployed agentic AI systems, detailing their technical and safety feature...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2602.17729] Stop Saying "AI"

The paper argues for a more precise use of the term 'AI' in discussions, particularly in military contexts, to enhance clarity and unders...

arXiv - AI · 4 min · about 1 month ago

Generative Ai

[2602.17720] "Everyone's using it, but no one is allowed to talk about it": College Students' Experiences Navigating the Higher Education Environment in a Generative AI World

This article explores college students' experiences with generative AI in higher education, highlighting the pressures and social dynamic...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18182] Capabilities Ain't All You Need: Measuring Propensities in AI

The paper introduces a framework for measuring AI propensities, emphasizing the importance of behavioral tendencies alongside capabilitie...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18160] Unifying Formal Explanations: A Complexity-Theoretic Perspective

This paper presents a unified framework for understanding formal explanations in machine learning, focusing on the computational complexi...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18131] Learning Long-Range Dependencies with Temporal Predictive Coding

The paper presents a novel method combining Temporal Predictive Coding with Real-Time Recurrent Learning to effectively learn long-range ...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.17672] Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse

This article evaluates the effectiveness of large language models (LLMs) in providing support for survivors of technology-facilitated abu...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18037] Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards

This paper presents a novel approach to prevent reward hacking in reinforcement learning by using gradient regularization, enhancing the ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.17671] AI Hallucination from Students' Perspective: A Thematic Analysis

This study analyzes university students' experiences with AI hallucinations, revealing detection strategies and misconceptions about thei...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.07152] Trojans in Artificial Intelligence (TrojAI) Final Report

The Trojans in Artificial Intelligence (TrojAI) Final Report outlines the findings of a multi-year initiative aimed at addressing vulnera...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18201] SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps

The paper explores the limitations of unsupervised learning methods, specifically Self-Organizing Maps (SOMs), in maintaining fairness by...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17910] Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems

This paper presents APEMO, a novel runtime scheduling layer designed to enhance the reliability of long-horizon agentic systems by optimi...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.17975] Generating adversarial inputs for a graph neural network model of AC power flow

This paper presents a method for generating adversarial inputs for a graph neural network model used in AC power flow analysis, demonstra...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.17676] Epistemic Traps: Rational Misalignment Driven by Model Misspecification

This paper explores how model misspecification leads to rational misalignments in AI behavior, presenting a new framework for understandi...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17948] A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion

This paper explores the accuracy-robustness trade-off in deep learning through a geometric lens, utilizing Symmetry-Breaking Dimensional ...

arXiv - Machine Learning · 4 min · about 1 month ago

Nlp

[2602.17947] Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition

This article explores the generalization of bilevel programming in hyperparameter optimization, focusing on bias-variance decomposition t...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17918] Distribution-Free Sequential Prediction with Abstentions

This paper explores a distribution-free approach to sequential prediction with abstentions, proposing an algorithm called AbstainBoost th...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17861] JAX-Privacy: A library for differentially private machine learning

JAX-Privacy is a new library aimed at simplifying the implementation of differentially private machine learning, offering both customizat...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.17853] Neural Prior Estimation: Learning Class Priors from Latent Representations

The paper introduces Neural Prior Estimator (NPE), a framework for learning class priors from latent representations, addressing class im...

arXiv - Machine Learning · 3 min · about 1 month ago

Previous Page 73 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2602.18277] PRISM: Parallel Reward Integration with Symmetry for MORL

[2602.17753] The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

[2602.17729] Stop Saying "AI"

[2602.17720] "Everyone's using it, but no one is allowed to talk about it": College Students' Experiences Navigating the Higher Education Environment in a Generative AI World

[2602.18182] Capabilities Ain't All You Need: Measuring Propensities in AI

[2602.18160] Unifying Formal Explanations: A Complexity-Theoretic Perspective

[2602.18131] Learning Long-Range Dependencies with Temporal Predictive Coding

[2602.17672] Assessing LLM Response Quality in the Context of Technology-Facilitated Abuse

[2602.18037] Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards

[2602.17671] AI Hallucination from Students' Perspective: A Thematic Analysis

[2602.07152] Trojans in Artificial Intelligence (TrojAI) Final Report

[2602.18201] SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps

[2602.17910] Alignment in Time: Peak-Aware Orchestration for Long-Horizon Agentic Systems

[2602.17975] Generating adversarial inputs for a graph neural network model of AC power flow

[2602.17676] Epistemic Traps: Rational Misalignment Driven by Model Misspecification

[2602.17948] A Geometric Probe of the Accuracy-Robustness Trade-off: Sharp Boundaries in Symmetry-Breaking Dimensional Expansion

[2602.17947] Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition

[2602.17918] Distribution-Free Sequential Prediction with Abstentions

[2602.17861] JAX-Privacy: A library for differentially private machine learning

[2602.17853] Neural Prior Estimation: Learning Class Priors from Latent Representations

Related Topics

Stay updated with AI News