AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

All Content

Machine Learning

[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

This article discusses a novel approach to concept erasure in text-to-image diffusion models, focusing on High-Level Representation Misdi...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2602.19539] Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

This paper evaluates the effectiveness of low-cost cosmetic modifications in deceiving AI age estimation systems, revealing significant v...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2602.19605] CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning

The paper presents CLCR, a novel approach for multimodal learning that organizes features into a three-level semantic hierarchy to enhanc...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19574] CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment

The paper presents CTC-TTS, a novel dual-streaming text-to-speech system that utilizes a CTC-based aligner for improved text-speech align...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2602.19569] Temporal-Aware Heterogeneous Graph Reasoning with Multi-View Fusion for Temporal Question Answering

This paper presents a novel framework for Temporal Question Answering over Temporal Knowledge Graphs, addressing limitations in temporal ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.19555] Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

This article discusses the cybersecurity implications of agentic AI systems, focusing on threats and defenses in runtime supply chains, h...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.19410] BioEnvSense: A Human-Centred Security Framework for Preventing Behaviour-Driven Cyber Incidents

The paper introduces BioEnvSense, a human-centered security framework that leverages a hybrid CNN-LSTM model to analyze biometric and env...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.19534] Large Language Model-Assisted UAV Operations and Communications: A Multifaceted Survey and Tutorial

This article surveys the integration of Large Language Models (LLMs) in Uncrewed Aerial Vehicles (UAVs), exploring their potential to enh...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19239] Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

This article investigates procedural hallucinations in language models, identifying specific attention deficits that lead to errors in ex...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

This article presents a red-teaming study of Claude Opus and ChatGPT as security advisors for Trusted Execution Environments (TEEs), high...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

The paper presents CaReFlow, a novel approach for multimodal fusion that addresses modality gaps using cyclic adaptive rectified flow, en...

arXiv - Machine Learning · 4 min · about 1 month ago

Nlp

[2602.19349] UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation

The paper presents UP-Fuse, an innovative framework for LiDAR-camera fusion that enhances 3D panoptic segmentation by addressing sensor d...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2602.19008] Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks

This article explores the reliability failures of language agents in long-horizon tasks, attributing these failures to deviations from ca...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19324] RetinaVision: XAI-Driven Augmented Regulation for Precise Retinal Disease Classification using deep learning framework

The article presents RetinaVision, a deep learning framework for accurate classification of retinal diseases using optical coherence tomo...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.18997] Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

This paper explores the convergence properties of Matrix Stochastic Mirror Descent (SMD) in overparameterized settings, proving that it c...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

This paper explores the potential of large language models (LLMs) as post-hoc explainability tools in credit risk models, evaluating thei...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19314] IPv2: An Improved Image Purification Strategy for Real-World Ultra-Low-Dose Lung CT Denoising

The paper presents IPv2, an enhanced image purification strategy for improving lung CT denoising at ultra-low doses, addressing limitatio...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18870] Federated Measurement of Demographic Disparities from Quantile Sketches

This paper presents a federated learning approach to measure demographic disparities using quantile sketches, addressing privacy concerns...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18868] Limits of Convergence-Rate Control for Open-Weight Safety

This paper explores the limitations of convergence-rate control methods for open-weight foundation models, highlighting the challenges in...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Agents

[2602.19304] Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation

The paper presents CaPE, a multimodal path planning method that enhances cooperation among decentralized agents through language communic...

arXiv - AI · 4 min · about 1 month ago

Previous Page 63 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

[2602.19539] Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

[2602.19605] CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning

[2602.19574] CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment

[2602.19569] Temporal-Aware Heterogeneous Graph Reasoning with Multi-View Fusion for Temporal Question Answering

[2602.19555] Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

[2602.19410] BioEnvSense: A Human-Centred Security Framework for Preventing Behaviour-Driven Cyber Incidents

[2602.19534] Large Language Model-Assisted UAV Operations and Communications: A Multifaceted Survey and Tutorial

[2602.19239] Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

[2602.19349] UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation

[2602.19008] Capable but Unreliable: Canonical Path Deviation as a Causal Mechanism of Agent Failure in Long-Horizon Tasks

[2602.19324] RetinaVision: XAI-Driven Augmented Regulation for Precise Retinal Disease Classification using deep learning framework

[2602.18997] Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

[2602.18895] Could Large Language Models work as Post-hoc Explainability Tools in Credit Risk Models?

[2602.19314] IPv2: An Improved Image Purification Strategy for Real-World Ultra-Low-Dose Lung CT Denoising

[2602.18870] Federated Measurement of Demographic Disparities from Quantile Sketches

[2602.18868] Limits of Convergence-Rate Control for Open-Weight Safety

[2602.19304] Safe and Interpretable Multimodal Path Planning for Multi-Agent Cooperation

Related Topics

Stay updated with AI News