AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 16 hours ago

Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min · about 16 hours ago

Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min · about 16 hours ago

All Content

Machine Learning

[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

This paper investigates the impact of encoder-side poisoning on text-to-image models, revealing that traditional evaluations of backdoor ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20170] CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

The paper introduces CAGE, a framework for culturally adaptive red-teaming benchmark generation, addressing the limitations of existing b...

arXiv - AI · 3 min · about 1 month ago

Robotics

[2602.20169] Autonomous AI and Ownership Rules

This article explores the ownership rules surrounding AI-generated outputs, examining how they are linked to their creators and the impli...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20168] Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing

This article presents a benchmarking framework for early deterioration prediction in emergency triage, comparing hospital-rich settings w...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.20166] ConceptRM: The Quest to Mitigate Alert Fatigue through Consensus-Based Purity-Driven Data Cleaning for Reflection Modelling

The paper presents ConceptRM, a novel method aimed at reducing alert fatigue in intelligent agents by improving data cleaning processes f...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21064] Motivation is Something You Need

The paper presents a novel training paradigm for AI that integrates concepts from affective neuroscience, focusing on a dual-model framew...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.21061] Tool Building as a Path to "Superintelligence"

The paper explores how Large Language Models (LLMs) can achieve superintelligence through the Diligent Learner framework, emphasizing the...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

This article introduces Vision-Language Causal Graphs (VLCGs) to enhance causal reasoning in Vision-Language Models (LVLMs), addressing t...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

This paper presents a novel evaluation framework for assessing the alignment of language models under realistic pressure, revealing behav...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20770] Pipeline for Verifying LLM-Generated Mathematical Solutions

This paper presents a pipeline for verifying mathematical solutions generated by Large Language Models (LLMs), emphasizing both automatic...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20710] Counterfactual Simulation Training for Chain-of-Thought Faithfulness

The paper introduces Counterfactual Simulation Training (CST), a method designed to enhance Chain-of-Thought (CoT) faithfulness in large ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20708] ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

The paper introduces ICON, a novel framework designed to defend Large Language Model (LLM) agents against Indirect Prompt Injection (IPI)...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20696] PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding

The paper presents PromptCD, a method for enhancing AI behavior at test time using polarity-prompt contrastive decoding, improving alignm...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20628] When can we trust untrusted monitoring? A safety case sketch across collusion strategies

This paper explores the challenges of ensuring safety in AI systems using untrusted monitoring. It develops a taxonomy of collusion strat...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20624] Physics-based phenomenological characterization of cross-modal bias in multimodal models

This paper explores the cross-modal bias in multimodal large language models (MLLMs) through a physics-based phenomenological approach, a...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2602.20424] Implicit Intelligence -- Evaluating Agents on What Users Don't Say

The paper presents an evaluation framework called Implicit Intelligence, which assesses AI agents' ability to understand unstated user re...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

AI energy use: New tools show which model consumes the most power, and why

The article discusses new tools that analyze the energy consumption of various AI models, highlighting the importance of understanding po...

AI Events · 1 min · about 1 month ago

Ai Safety

Anthropic Drops Flagship Safety Pledge

Anthropic has announced the discontinuation of its flagship safety pledge, raising concerns about AI safety commitments in the industry.

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Machine Learning

Pete Hegseth’s Pentagon AI bro squad includes a former Uber executive and a private equity billionaire | The Verge

The article discusses a Pentagon meeting involving Defense Secretary Pete Hegseth, former Uber executive Emil Michael, and private equity...

The Verge - AI · 11 min · about 1 month ago

Ai Safety

AI-linked fears roil some corners of Wall Street after years of hype and gains

Concerns over AI spending are causing volatility in Wall Street, as investors question profitability. Major companies like IBM and Master...

AI Tools & Products · 5 min · about 1 month ago

Previous Page 56 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

All Content

[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

[2602.20170] CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

[2602.20169] Autonomous AI and Ownership Rules

[2602.20168] Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing

[2602.20166] ConceptRM: The Quest to Mitigate Alert Fatigue through Consensus-Based Purity-Driven Data Cleaning for Reflection Modelling

[2602.21064] Motivation is Something You Need

[2602.21061] Tool Building as a Path to "Superintelligence"

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

[2602.20770] Pipeline for Verifying LLM-Generated Mathematical Solutions

[2602.20710] Counterfactual Simulation Training for Chain-of-Thought Faithfulness

[2602.20708] ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

[2602.20696] PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding

[2602.20628] When can we trust untrusted monitoring? A safety case sketch across collusion strategies

[2602.20624] Physics-based phenomenological characterization of cross-modal bias in multimodal models

[2602.20424] Implicit Intelligence -- Evaluating Agents on What Users Don't Say

AI energy use: New tools show which model consumes the most power, and why

Anthropic Drops Flagship Safety Pledge

Pete Hegseth’s Pentagon AI bro squad includes a former Uber executive and a private equity billionaire | The Verge

AI-linked fears roil some corners of Wall Street after years of hype and gains

Related Topics

Stay updated with AI News