AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2504.17311] FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation
Llms

[2504.17311] FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

FLUKE introduces a novel framework for evaluating the robustness of NLP models through controlled linguistic variations, revealing task-d...

arXiv - AI · 4 min ·
[2508.01916] Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
Machine Learning

[2508.01916] Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning

This paper explores the decomposition of representation spaces in neural networks into interpretable subspaces using an unsupervised lear...

arXiv - Machine Learning · 4 min ·
[2507.09650] Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
Llms

[2507.09650] Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

This paper presents the Community Alignment Dataset, which aims to address the challenge of aligning large language models (LLMs) with di...

arXiv - Machine Learning · 4 min ·
[2503.16021] Imitating AI agents increase diversity in homogeneous information environments but can reduce it in heterogeneous ones
Llms

[2503.16021] Imitating AI agents increase diversity in homogeneous information environments but can reduce it in heterogeneous ones

This article explores how AI agents imitating human content affect information diversity, revealing context-dependent outcomes in homogen...

arXiv - AI · 4 min ·
[2412.11471] TrapFlow: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning
Machine Learning

[2412.11471] TrapFlow: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning

The paper presents TrapFlow, a novel defense mechanism against website fingerprinting attacks using dynamic backdoor learning to enhance ...

arXiv - AI · 4 min ·
[2503.07199] How Well Can Differential Privacy Be Audited in One Run?
Machine Learning

[2503.07199] How Well Can Differential Privacy Be Audited in One Run?

This article explores the efficacy of one-run auditing in differential privacy, highlighting its potential to improve the auditing proces...

arXiv - Machine Learning · 3 min ·
[2410.06816] Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification
Machine Learning

[2410.06816] Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification

This paper explores the limitations and potential of multi-neuron convex relaxations in neural network certification, revealing a univers...

arXiv - Machine Learning · 4 min ·
[2512.15783] AI Epidemiology: achieving explainable AI through expert oversight patterns
Machine Learning

[2512.15783] AI Epidemiology: achieving explainable AI through expert oversight patterns

The paper presents 'AI Epidemiology', a framework for enhancing explainability in AI systems through expert oversight, using population-l...

arXiv - Machine Learning · 4 min ·
[2511.11924] A Neuromorphic Architecture for Scalable Event-Based Control
Ai Safety

[2511.11924] A Neuromorphic Architecture for Scalable Event-Based Control

This paper presents a neuromorphic architecture for scalable event-based control, leveraging the rebound Winner-Take-All motif to integra...

arXiv - AI · 3 min ·
[2511.02605] Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning
Ai Infrastructure

[2511.02605] Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning

This paper presents an adaptive shielding framework for reinforcement learning that utilizes GR(1) specifications to ensure safety and li...

arXiv - AI · 4 min ·
[2510.26752] The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy
Machine Learning

[2510.26752] The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy

The paper explores a framework for balancing AI agent autonomy and human oversight through a cooperative game model, ensuring safety with...

arXiv - Machine Learning · 4 min ·
[2510.25860] Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters
Llms

[2510.25860] Through the Judge's Eyes: Inferred Thinking Traces Improve Reliability of LLM Raters

This article discusses a framework that enhances the reliability of large language model (LLM) raters by inferring thinking traces from l...

arXiv - AI · 4 min ·
[2411.08875] Causal Explanations for Image Classifiers
Ai Infrastructure

[2411.08875] Causal Explanations for Image Classifiers

This paper presents a novel approach to generating causal explanations for image classifiers, introducing a black-box algorithm grounded ...

arXiv - AI · 3 min ·
[2403.08802] Governance of Generative Artificial Intelligence for Companies
Llms

[2403.08802] Governance of Generative Artificial Intelligence for Companies

This article reviews governance frameworks for Generative AI, focusing on how companies can effectively manage the integration of large l...

arXiv - Machine Learning · 4 min ·
[2602.18307] VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean
Llms

[2602.18307] VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

The paper introduces VeriSoftBench, a benchmark for formal verification in Lean, highlighting its limitations and performance insights fr...

arXiv - Machine Learning · 3 min ·
[2602.18097] Interacting safely with cyclists using Hamilton-Jacobi reachability and reinforcement learning
Llms

[2602.18097] Interacting safely with cyclists using Hamilton-Jacobi reachability and reinforcement learning

This paper presents a framework for autonomous vehicles to safely interact with cyclists by integrating Hamilton-Jacobi reachability anal...

arXiv - Machine Learning · 3 min ·
[2602.18252] On the Adversarial Robustness of Discrete Image Tokenizers
Machine Learning

[2602.18252] On the Adversarial Robustness of Discrete Image Tokenizers

This paper investigates the adversarial robustness of discrete image tokenizers, highlighting their vulnerabilities and proposing a novel...

arXiv - AI · 3 min ·
[2602.18053] On the Generalization and Robustness in Conditional Value-at-Risk
Nlp

[2602.18053] On the Generalization and Robustness in Conditional Value-at-Risk

This paper explores the generalization and robustness of Conditional Value-at-Risk (CVaR) in the context of heavy-tailed data, providing ...

arXiv - Machine Learning · 4 min ·
[2602.18047] CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras
Machine Learning

[2602.18047] CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras

CityGuard introduces a novel framework for privacy-preserving identity retrieval across urban surveillance cameras, addressing challenges...

arXiv - Machine Learning · 4 min ·
[2602.17929] ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging
Machine Learning

[2602.17929] ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

ZACH-ViT introduces a novel Vision Transformer architecture tailored for medical imaging, enhancing performance by removing fixed spatial...

arXiv - Machine Learning · 4 min ·
Previous Page 71 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime