AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · 1 day ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · 1 day ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · 1 day ago

All Content

Llms

[2504.17311] FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

FLUKE introduces a novel framework for evaluating the robustness of NLP models through controlled linguistic variations, revealing task-d...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2508.01916] Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning

This paper explores the decomposition of representation spaces in neural networks into interpretable subspaces using an unsupervised lear...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2507.09650] Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

This paper presents the Community Alignment Dataset, which aims to address the challenge of aligning large language models (LLMs) with di...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2503.16021] Imitating AI agents increase diversity in homogeneous information environments but can reduce it in heterogeneous ones

This article explores how AI agents imitating human content affect information diversity, revealing context-dependent outcomes in homogen...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2412.11471] TrapFlow: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning

The paper presents TrapFlow, a novel defense mechanism against website fingerprinting attacks using dynamic backdoor learning to enhance ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2503.07199] How Well Can Differential Privacy Be Audited in One Run?

This article explores the efficacy of one-run auditing in differential privacy, highlighting its potential to improve the auditing proces...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2410.06816] Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification

This paper explores the limitations and potential of multi-neuron convex relaxations in neural network certification, revealing a univers...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2512.15783] AI Epidemiology: achieving explainable AI through expert oversight patterns

The paper presents 'AI Epidemiology', a framework for enhancing explainability in AI systems through expert oversight, using population-l...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2511.11924] A Neuromorphic Architecture for Scalable Event-Based Control

This paper presents a neuromorphic architecture for scalable event-based control, leveraging the rebound Winner-Take-All motif to integra...

arXiv - AI · 3 min · about 1 month ago

Ai Infrastructure

[2511.02605] Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning

This paper presents an adaptive shielding framework for reinforcement learning that utilizes GR(1) specifications to ensure safety and li...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2510.26752] The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy

The paper explores a framework for balancing AI agent autonomy and human oversight through a cooperative game model, ensuring safety with...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2510.25860] Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters

This article discusses a framework that enhances the reliability of large language model (LLM) raters by inferring thinking traces from l...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2411.08875] Causal Explanations for Image Classifiers

This paper presents a novel approach to generating causal explanations for image classifiers, introducing a black-box algorithm grounded ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2403.08802] Governance of Generative Artificial Intelligence for Companies

This article reviews governance frameworks for Generative AI, focusing on how companies can effectively manage the integration of large l...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.18307] VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

The paper introduces VeriSoftBench, a benchmark for formal verification in Lean, highlighting its limitations and performance insights fr...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18097] Interacting safely with cyclists using Hamilton-Jacobi reachability and reinforcement learning

This paper presents a framework for autonomous vehicles to safely interact with cyclists by integrating Hamilton-Jacobi reachability anal...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.18252] On the Adversarial Robustness of Discrete Image Tokenizers

This paper investigates the adversarial robustness of discrete image tokenizers, highlighting their vulnerabilities and proposing a novel...

arXiv - AI · 3 min · about 1 month ago

Nlp

[2602.18053] On the Generalization and Robustness in Conditional Value-at-Risk

This paper explores the generalization and robustness of Conditional Value-at-Risk (CVaR) in the context of heavy-tailed data, providing ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18047] CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras

CityGuard introduces a novel framework for privacy-preserving identity retrieval across urban surveillance cameras, addressing challenges...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.17929] ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

ZACH-ViT introduces a novel Vision Transformer architecture tailored for medical imaging, enhancing performance by removing fixed spatial...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 71 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2504.17311] FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

[2508.01916] Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning

[2507.09650] Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

[2503.16021] Imitating AI agents increase diversity in homogeneous information environments but can reduce it in heterogeneous ones

[2412.11471] TrapFlow: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning

[2503.07199] How Well Can Differential Privacy Be Audited in One Run?

[2410.06816] Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification

[2512.15783] AI Epidemiology: achieving explainable AI through expert oversight patterns

[2511.11924] A Neuromorphic Architecture for Scalable Event-Based Control

[2511.02605] Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning

[2510.26752] The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy

[2510.25860] Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters

[2411.08875] Causal Explanations for Image Classifiers

[2403.08802] Governance of Generative Artificial Intelligence for Companies

[2602.18307] VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

[2602.18097] Interacting safely with cyclists using Hamilton-Jacobi reachability and reinforcement learning

[2602.18252] On the Adversarial Robustness of Discrete Image Tokenizers

[2602.18053] On the Generalization and Robustness in Conditional Value-at-Risk

[2602.18047] CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras

[2602.17929] ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

Related Topics

Stay updated with AI News