AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2509.18949] Towards Privacy-Aware Bayesian Networks: A Credal Approach
Machine Learning

[2509.18949] Towards Privacy-Aware Bayesian Networks: A Credal Approach

This paper presents a novel approach to privacy-aware Bayesian networks using credal networks, addressing the trade-off between privacy a...

arXiv - AI · 4 min ·
[2512.17898] Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally
Ai Agents

[2512.17898] Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally

This study explores how humanlike AI design influences user engagement and trust across different cultures, revealing that anthropomorphi...

arXiv - AI · 4 min ·
[2510.27623] BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
Llms

[2510.27623] BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning

The paper presents BEAT, a novel framework for executing visual backdoor attacks on Vision-Language Model (VLM)-based embodied agents, hi...

arXiv - AI · 4 min ·
[2509.03738] Mechanistic Interpretability with Sparse Autoencoder Neural Operators
Machine Learning

[2509.03738] Mechanistic Interpretability with Sparse Autoencoder Neural Operators

This article introduces Sparse Autoencoder Neural Operators (SAE-NOs), a novel approach in machine learning that enhances interpretabilit...

arXiv - AI · 4 min ·
[2510.16658] Foundation and Large-Scale AI Models in Neuroscience: A Comprehensive Review
Machine Learning

[2510.16658] Foundation and Large-Scale AI Models in Neuroscience: A Comprehensive Review

This comprehensive review explores the impact of large-scale AI models on neuroscience, detailing their applications in neuroimaging, bra...

arXiv - AI · 4 min ·
[2508.08540] Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems
Machine Learning

[2508.08540] Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems

This article presents a novel approach to local Stochastic Gradient Descent (SGD) for deep learning on heterogeneous systems, demonstrati...

arXiv - Machine Learning · 3 min ·
[2510.15862] Rethinking the Design of Reinforcement Learning-Based Deep Research Agents
Llms

[2510.15862] Rethinking the Design of Reinforcement Learning-Based Deep Research Agents

This paper explores the design of reinforcement learning-based deep research agents, emphasizing key design choices that enhance performa...

arXiv - AI · 4 min ·
[2507.04446] Sampling-aware Adversarial Attacks Against Large Language Models
Llms

[2507.04446] Sampling-aware Adversarial Attacks Against Large Language Models

This article presents a novel approach to adversarial attacks on large language models (LLMs) by incorporating sampling strategies, signi...

arXiv - Machine Learning · 4 min ·
[2507.01781] Symbolic Branch Networks: Tree-Inherited Neural Models for Interpretable Multiclass Classification
Machine Learning

[2507.01781] Symbolic Branch Networks: Tree-Inherited Neural Models for Interpretable Multiclass Classification

This article presents Symbolic Branch Networks (SBNs), a novel neural model that integrates decision tree structures for enhanced interpr...

arXiv - AI · 4 min ·
[2506.22740] Explanations are a Means to an End: Decision Theoretic Explanation Evaluation
Machine Learning

[2506.22740] Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

The paper presents a decision-theoretic framework for evaluating explanations in AI, emphasizing their role as information signals that i...

arXiv - AI · 3 min ·
[2505.19193] SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data
Nlp

[2505.19193] SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data

The paper presents SuperMAN, a framework designed for learning from temporally sparse and heterogeneous data, enhancing interpretability ...

arXiv - Machine Learning · 4 min ·
[2505.11111] FairSHAP: Preprocessing for Fairness Through Attribution-Based Data Augmentation
Machine Learning

[2505.11111] FairSHAP: Preprocessing for Fairness Through Attribution-Based Data Augmentation

FairSHAP introduces a novel preprocessing framework that utilizes Shapley value attribution to enhance fairness in machine learning model...

arXiv - AI · 4 min ·
[2404.09877] Synergising Human-like Responses and Machine Intelligence for Planning in Disaster Response
Machine Learning

[2404.09877] Synergising Human-like Responses and Machine Intelligence for Planning in Disaster Response

This paper presents a novel cognitive architecture that combines human-like responses with machine intelligence for effective disaster re...

arXiv - AI · 4 min ·
[2505.03646] GRILL: Restoring Gradient Signal in Ill-Conditioned Layers for More Effective Adversarial Attacks on Autoencoders
Machine Learning

[2505.03646] GRILL: Restoring Gradient Signal in Ill-Conditioned Layers for More Effective Adversarial Attacks on Autoencoders

The paper presents GRILL, a method to enhance adversarial attacks on autoencoders by restoring gradient signals in ill-conditioned layers...

arXiv - AI · 4 min ·
[2402.08646] Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data
Machine Learning

[2402.08646] Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data

This paper presents a unified probabilistic framework for symbolic reasoning, drawing inspiration from neuroscience, and aims to enhance ...

arXiv - AI · 3 min ·
[2206.13174] Towards Unifying Perceptual Reasoning and Logical Reasoning
Machine Learning

[2206.13174] Towards Unifying Perceptual Reasoning and Logical Reasoning

The paper presents a probabilistic model that unifies perceptual reasoning and logical reasoning, highlighting their shared processes of ...

arXiv - AI · 3 min ·
[2504.02996] Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization
Machine Learning

[2504.02996] Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization

This article presents a novel approach to Noise-Aware Generalization (NAG) in machine learning, addressing the challenges posed by label ...

arXiv - Machine Learning · 4 min ·
[2602.20134] Modeling Epidemiological Dynamics Under Adversarial Data and User Deception
Machine Learning

[2602.20134] Modeling Epidemiological Dynamics Under Adversarial Data and User Deception

This paper presents a game-theoretic model to analyze how adversarial data and user deception affect epidemiological dynamics, particular...

arXiv - AI · 4 min ·
[2602.20114] Benchmarking Unlearning for Vision Transformers
Machine Learning

[2602.20114] Benchmarking Unlearning for Vision Transformers

This article presents a benchmarking study on unlearning algorithms for Vision Transformers (VTs), highlighting their performance compare...

arXiv - AI · 4 min ·
[2602.20089] StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues
Llms

[2602.20089] StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

The paper presents StructXLIP, a novel approach that enhances vision-language models by integrating multimodal structural cues, improving...

arXiv - AI · 4 min ·
Previous Page 61 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime