AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min · about 3 hours ago

Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min · about 3 hours ago

Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min · about 3 hours ago

All Content

Machine Learning

[2602.17053] RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

The paper introduces RFEval, a benchmark for assessing reasoning faithfulness in large reasoning models, highlighting issues of unfaithfu...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.17038] Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

The paper presents a novel Phase-Aware Mixture of Experts (PA-MoE) architecture for reinforcement learning, addressing the limitations of...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16958] Automating Agent Hijacking via Structural Template Injection

This paper presents Phantom, an automated framework for agent hijacking via Structural Template Injection, enhancing attack success rates...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.16984] Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning

This paper explores the limitations of black-box safety evaluations in AI systems, highlighting the challenges posed by latent context co...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16943] Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

This paper explores the discrepancies between text safety and tool-call safety in large language model (LLM) agents, introducing the GAP ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16942] SourceBench: Can AI Answers Reference Quality Web Sources?

The paper introduces SourceBench, a benchmark designed to evaluate the quality of web sources cited by AI models across various query typ...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.16935] DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

The paper introduces DeepContext, a stateful framework for detecting adversarial intent drift in multi-turn dialogues within large langua...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.16931] Narrow fine-tuning erodes safety alignment in vision-language agents

The paper explores how narrow fine-tuning of vision-language agents can lead to significant safety alignment issues, highlighting the ris...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.16901] AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

The paper introduces AgentLAB, a benchmark for evaluating the vulnerability of LLM agents to long-horizon attacks, highlighting their sus...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.16832] IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

The paper introduces IndicJR, a benchmark for evaluating jailbreak robustness in large language models across 12 South Asian languages, r...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.16716] Contextuality from Single-State Representations: An Information-Theoretic Principle for Adaptive Intelligence

This paper explores the concept of contextuality in adaptive intelligence, demonstrating that single-state representations incur an infor...

arXiv - AI · 3 min · about 2 months ago

Ai Safety

Google says its AI systems helped deter Play Store malware in 2025

Google's 2025 report reveals a significant reduction in malware on the Play Store, attributing the success to enhanced AI-driven security...

AI Tools & Products · 4 min · about 2 months ago

Ai Safety

Most AI bots lack basic safety disclosures, study finds

A recent study reveals that most AI bots fail to provide essential safety disclosures, raising concerns about user safety and transparenc...

AI Tools & Products · 1 min · about 2 months ago

Ai Infrastructure

4 highlights from Google CEO Sundar Pichai's talk at the AI Impact Summit 2026 in India

Sundar Pichai's address at the AI Impact Summit 2026 highlights Google's advancements in AI, infrastructure investments in India, and the...

AI Tools & Products · 5 min · about 2 months ago

Ai Safety

The People vs. AI

A grassroots movement is emerging across the U.S. as citizens unite against the rapid expansion of the AI industry, raising concerns abou...

AI Tools & Products · 21 min · about 2 months ago

Ai Startups

Anthropic: AI triggers the ‘SaaSpocalypse’

The article discusses the potential disruption in the software industry due to AI advancements, particularly following Anthropic's new to...

AI Tools & Products · 6 min · about 2 months ago

Ai Agents

Anthropic, Infosys to build AI agents for regulated industries

Infosys partners with Anthropic to develop AI agents tailored for regulated industries like financial services, focusing on compliance an...

AI Tools & Products · 5 min · about 2 months ago

Ai Startups

The Pitt has a sharp take on AI | The Verge

HBO's 'The Pitt' explores the complexities of generative AI in healthcare, highlighting its potential benefits and risks through a grippi...

The Verge - AI · 7 min · about 2 months ago

Llms

Prompt Injection and Info Leak Immune AI Agent, working Demo for Testing

The article discusses a new AI agent prototype designed to combat prompt injection and information leaks, addressing a critical security ...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Robotics

The AI security nightmare is here and it looks suspiciously like lobster | The Verge

A hacker exploited a vulnerability in Cline's AI workflow, leading to the installation of OpenClaw, highlighting significant security ris...

The Verge - AI · 4 min · about 2 months ago

Previous Page 83 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

All Content

[2602.17053] RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models

[2602.17038] Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

[2602.16958] Automating Agent Hijacking via Structural Template Injection

[2602.16984] Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning

[2602.16943] Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

[2602.16942] SourceBench: Can AI Answers Reference Quality Web Sources?

[2602.16935] DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

[2602.16931] Narrow fine-tuning erodes safety alignment in vision-language agents

[2602.16901] AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

[2602.16832] IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages

[2602.16716] Contextuality from Single-State Representations: An Information-Theoretic Principle for Adaptive Intelligence

Google says its AI systems helped deter Play Store malware in 2025

Most AI bots lack basic safety disclosures, study finds

4 highlights from Google CEO Sundar Pichai's talk at the AI Impact Summit 2026 in India

The People vs. AI

Anthropic: AI triggers the ‘SaaSpocalypse’

Anthropic, Infosys to build AI agents for regulated industries

The Pitt has a sharp take on AI | The Verge

Prompt Injection and Info Leak Immune AI Agent, working Demo for Testing

The AI security nightmare is here and it looks suspiciously like lobster | The Verge

Related Topics

Stay updated with AI News