AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

All Content

Machine Learning

[2509.18949] Towards Privacy-Aware Bayesian Networks: A Credal Approach

This paper presents a novel approach to privacy-aware Bayesian networks using credal networks, addressing the trade-off between privacy a...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2512.17898] Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally

This study explores how humanlike AI design influences user engagement and trust across different cultures, revealing that anthropomorphi...

arXiv - AI · 4 min · about 1 month ago

Llms

[2510.27623] BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning

The paper presents BEAT, a novel framework for executing visual backdoor attacks on Vision-Language Model (VLM)-based embodied agents, hi...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2509.03738] Mechanistic Interpretability with Sparse Autoencoder Neural Operators

This article introduces Sparse Autoencoder Neural Operators (SAE-NOs), a novel approach in machine learning that enhances interpretabilit...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2510.16658] Foundation and Large-Scale AI Models in Neuroscience: A Comprehensive Review

This comprehensive review explores the impact of large-scale AI models on neuroscience, detailing their applications in neuroimaging, bra...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2508.08540] Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems

This article presents a novel approach to local Stochastic Gradient Descent (SGD) for deep learning on heterogeneous systems, demonstrati...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2510.15862] Rethinking the Design of Reinforcement Learning-Based Deep Research Agents

This paper explores the design of reinforcement learning-based deep research agents, emphasizing key design choices that enhance performa...

arXiv - AI · 4 min · about 1 month ago

Llms

[2507.04446] Sampling-aware Adversarial Attacks Against Large Language Models

This article presents a novel approach to adversarial attacks on large language models (LLMs) by incorporating sampling strategies, signi...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2507.01781] Symbolic Branch Networks: Tree-Inherited Neural Models for Interpretable Multiclass Classification

This article presents Symbolic Branch Networks (SBNs), a novel neural model that integrates decision tree structures for enhanced interpr...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2506.22740] Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

The paper presents a decision-theoretic framework for evaluating explanations in AI, emphasizing their role as information signals that i...

arXiv - AI · 3 min · about 1 month ago

Nlp

[2505.19193] SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data

The paper presents SuperMAN, a framework designed for learning from temporally sparse and heterogeneous data, enhancing interpretability ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2505.11111] FairSHAP: Preprocessing for Fairness Through Attribution-Based Data Augmentation

FairSHAP introduces a novel preprocessing framework that utilizes Shapley value attribution to enhance fairness in machine learning model...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2404.09877] Synergising Human-like Responses and Machine Intelligence for Planning in Disaster Response

This paper presents a novel cognitive architecture that combines human-like responses with machine intelligence for effective disaster re...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2505.03646] GRILL: Restoring Gradient Signal in Ill-Conditioned Layers for More Effective Adversarial Attacks on Autoencoders

The paper presents GRILL, a method to enhance adversarial attacks on autoencoders by restoring gradient signals in ill-conditioned layers...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2402.08646] Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data

This paper presents a unified probabilistic framework for symbolic reasoning, drawing inspiration from neuroscience, and aims to enhance ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2206.13174] Towards Unifying Perceptual Reasoning and Logical Reasoning

The paper presents a probabilistic model that unifies perceptual reasoning and logical reasoning, highlighting their shared processes of ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2504.02996] Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization

This article presents a novel approach to Noise-Aware Generalization (NAG) in machine learning, addressing the challenges posed by label ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.20134] Modeling Epidemiological Dynamics Under Adversarial Data and User Deception

This paper presents a game-theoretic model to analyze how adversarial data and user deception affect epidemiological dynamics, particular...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20114] Benchmarking Unlearning for Vision Transformers

This article presents a benchmarking study on unlearning algorithms for Vision Transformers (VTs), highlighting their performance compare...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20089] StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

The paper presents StructXLIP, a novel approach that enhances vision-language models by integrating multimodal structural cues, improving...

arXiv - AI · 4 min · about 1 month ago

Previous Page 61 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2509.18949] Towards Privacy-Aware Bayesian Networks: A Credal Approach

[2512.17898] Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally

[2510.27623] BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning

[2509.03738] Mechanistic Interpretability with Sparse Autoencoder Neural Operators

[2510.16658] Foundation and Large-Scale AI Models in Neuroscience: A Comprehensive Review

[2508.08540] Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems

[2510.15862] Rethinking the Design of Reinforcement Learning-Based Deep Research Agents

[2507.04446] Sampling-aware Adversarial Attacks Against Large Language Models

[2507.01781] Symbolic Branch Networks: Tree-Inherited Neural Models for Interpretable Multiclass Classification

[2506.22740] Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

[2505.19193] SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data

[2505.11111] FairSHAP: Preprocessing for Fairness Through Attribution-Based Data Augmentation

[2404.09877] Synergising Human-like Responses and Machine Intelligence for Planning in Disaster Response

[2505.03646] GRILL: Restoring Gradient Signal in Ill-Conditioned Layers for More Effective Adversarial Attacks on Autoencoders

[2402.08646] Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data

[2206.13174] Towards Unifying Perceptual Reasoning and Logical Reasoning

[2504.02996] Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization

[2602.20134] Modeling Epidemiological Dynamics Under Adversarial Data and User Deception

[2602.20114] Benchmarking Unlearning for Vision Transformers

[2602.20089] StructXLIP: Enhancing Vision-language Models with Multimodal Structural Cues

Related Topics

Stay updated with AI News