AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Machine Learning

[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

This article discusses a novel approach to concept erasure in text-to-image diffusion models, focusing on High-Level Representation Misdi...

arXiv - AI · 4 min ·
[2602.19539] Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems
Computer Vision

[2602.19539] Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

This paper evaluates the effectiveness of low-cost cosmetic modifications in deceiving AI age estimation systems, revealing significant v...

arXiv - Machine Learning · 4 min ·
[2602.19605] CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning
Ai Safety

[2602.19605] CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning

The paper presents CLCR, a novel approach for multimodal learning that organizes features into a three-level semantic hierarchy to enhanc...

arXiv - AI · 4 min ·
[2602.19574] CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment
Llms

[2602.19574] CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment

The paper presents CTC-TTS, a novel dual-streaming text-to-speech system that utilizes a CTC-based aligner for improved text-speech align...

arXiv - AI · 3 min ·
[2602.19569] Temporal-Aware Heterogeneous Graph Reasoning with Multi-View Fusion for Temporal Question Answering
Ai Safety

[2602.19569] Temporal-Aware Heterogeneous Graph Reasoning with Multi-View Fusion for Temporal Question Answering

This paper presents a novel framework for Temporal Question Answering over Temporal Knowledge Graphs, addressing limitations in temporal ...

arXiv - AI · 3 min ·
[2602.19555] Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains
Llms

[2602.19555] Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

This article discusses the cybersecurity implications of agentic AI systems, focusing on threats and defenses in runtime supply chains, h...

arXiv - AI · 3 min ·
[2602.19410] BioEnvSense: A Human-Centred Security Framework for Preventing Behaviour-Driven Cyber Incidents
Machine Learning

[2602.19410] BioEnvSense: A Human-Centred Security Framework for Preventing Behaviour-Driven Cyber Incidents

The paper introduces BioEnvSense, a human-centered security framework that leverages a hybrid CNN-LSTM model to analyze biometric and env...

arXiv - Machine Learning · 3 min ·
[2602.19534] Large Language Model-Assisted UAV Operations and Communications: A Multifaceted Survey and Tutorial
Llms

[2602.19534] Large Language Model-Assisted UAV Operations and Communications: A Multifaceted Survey and Tutorial

This article surveys the integration of Large Language Models (LLMs) in Uncrewed Aerial Vehicles (UAVs), exploring their potential to enh...

arXiv - AI · 4 min ·
[2602.19239] Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations
Llms

[2602.19239] Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

This article investigates procedural hallucinations in language models, identifying specific attention deficits that lead to errors in ex...

arXiv - Machine Learning · 4 min ·
[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments
Llms

[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

This article presents a red-teaming study of Claude Opus and ChatGPT as security advisors for Trusted Execution Environments (TEEs), high...

arXiv - AI · 4 min ·
[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion
Machine Learning

[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

The paper presents CaReFlow, a novel approach for multimodal fusion that addresses modality gaps using cyclic adaptive rectified flow, en...

arXiv - Machine Learning · 4 min ·
[2602.19349] UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation
Nlp

[2602.19349] UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation

The paper presents UP-Fuse, an innovative framework for LiDAR-camera fusion that enhances 3D panoptic segmentation by addressing sensor d...

arXiv - AI · 4 min ·
[2602.19008] Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks
Ai Agents

[2602.19008] Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks

This article explores the reliability failures of language agents in long-horizon tasks, attributing these failures to deviations from ca...

arXiv - Machine Learning · 4 min ·
[2602.19324] RetinaVision: XAI-Driven Augmented Regulation for Precise Retinal Disease Classification using deep learning framework
Machine Learning

[2602.19324] RetinaVision: XAI-Driven Augmented Regulation for Precise Retinal Disease Classification using deep learning framework

The article presents RetinaVision, a deep learning framework for accurate classification of retinal diseases using optical coherence tomo...

arXiv - AI · 3 min ·
[2602.18997] Implicit Bias and Convergence of Matrix Stochastic Mirror Descent
Machine Learning

[2602.18997] Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

This paper explores the convergence properties of Matrix Stochastic Mirror Descent (SMD) in overparameterized settings, proving that it c...

arXiv - Machine Learning · 3 min ·
[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?
Llms

[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

This paper explores the potential of large language models (LLMs) as post-hoc explainability tools in credit risk models, evaluating thei...

arXiv - Machine Learning · 4 min ·
[2602.19314] IPv2: An Improved Image Purification Strategy for Real-World Ultra-Low-Dose Lung CT Denoising
Machine Learning

[2602.19314] IPv2: An Improved Image Purification Strategy for Real-World Ultra-Low-Dose Lung CT Denoising

The paper presents IPv2, an enhanced image purification strategy for improving lung CT denoising at ultra-low doses, addressing limitatio...

arXiv - AI · 4 min ·
[2602.18870] Federated Measurement of Demographic Disparities from Quantile Sketches
Machine Learning

[2602.18870] Federated Measurement of Demographic Disparities from Quantile Sketches

This paper presents a federated learning approach to measure demographic disparities using quantile sketches, addressing privacy concerns...

arXiv - Machine Learning · 3 min ·
[2602.18868] Limits of Convergence-Rate Control for Open-Weight Safety
Llms

[2602.18868] Limits of Convergence-Rate Control for Open-Weight Safety

This paper explores the limitations of convergence-rate control methods for open-weight foundation models, highlighting the challenges in...

arXiv - Machine Learning · 3 min ·
[2602.19304] Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation
Ai Agents

[2602.19304] Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation

The paper presents CaPE, a multimodal path planning method that enhances cooperation among decentralized agents through language communic...

arXiv - AI · 4 min ·
Previous Page 63 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime