AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 8 hours ago

Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min · about 8 hours ago

Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min · about 8 hours ago

All Content

Machine Learning

[2602.21534] ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

The paper presents ARLArena, a framework designed to enhance stability in agentic reinforcement learning (ARL) by providing a systematic ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.21496] Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

The paper explores the limitations of self-correction in Large Language Models (LLMs) regarding semantic sensitive information, introduci...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21268] A Dynamic Survey of Soft Set Theory and Its Extensions

This article provides a comprehensive overview of soft set theory and its various extensions, highlighting key definitions, constructions...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[D] where can I find more information about NTK wrt Lazy and Rich learning?

The Reddit discussion seeks insights on Neural Tangent Kernel (NTK) in relation to lazy and rich learning regimes, focusing on practical ...

Reddit - Machine Learning · 1 min · about 1 month ago

Ai Agents

How Quickly Will A.I. Agents Rip Through the Economy?

The article features an in-depth interview with Anthropic co-founder discussing the potential impact of AI agents on the economy, explori...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Machine Learning

We built a cryptographic authorization gateway for AI agents and planning to run limited red-team sessions

Sentinel Gateway addresses the challenge of instruction provenance in AI agents by ensuring only user-signed prompts are treated as execu...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Infrastructure

The White House wants AI companies to cover rate hikes. Most have already said they would. | TechCrunch

The White House is urging major AI companies to absorb rising electricity costs linked to their data centers. Most firms, including Micro...

TechCrunch - AI · 5 min · about 1 month ago

Machine Learning

[D] Is ICLR not giving Spotlights this year?

Discussion on whether ICLR is suspending Spotlights this year, with concerns over communication and potential impacts from OpenReview leaks.

Reddit - Machine Learning · 1 min · about 1 month ago

Ai Infrastructure

The public opposition to AI infrastructure is heating up | TechCrunch

Public opposition to AI infrastructure is rising, leading to legislative proposals for moratoriums on new data center constructions acros...

TechCrunch - AI · 11 min · about 1 month ago

Llms

[D] Is it possible to create a benchmark that can measure human-like intelligence?

The article discusses the limitations of current benchmarks for measuring human-like intelligence in AI, highlighting Francois Chollet's ...

Reddit - Machine Learning · 1 min · about 1 month ago

Llms

About 12% of U.S. teens turn to AI for emotional support or advice | TechCrunch

A Pew Research Center report reveals that 12% of U.S. teens use AI chatbots for emotional support, raising concerns among mental health p...

TechCrunch - AI · 5 min · about 1 month ago

Llms

[D] How can you tell if a paper was heavily written with the help of LLM?

This discussion explores methods to identify papers that are predominantly generated by language models like ChatGPT, focusing on detecti...

Reddit - Machine Learning · 1 min · about 1 month ago

Machine Learning

[D] The Anthropic–Pentagon situation isn’t political. It’s architectural.

The article discusses the Anthropic-Pentagon situation, framing it as a governance-layer conflict in AI rather than a political debate, f...

Reddit - Machine Learning · 1 min · about 1 month ago

Ai Safety

Anthropic and the Pentagon saga: the deadline approaches.

The article discusses the impending deadline set by the Pentagon for Anthropic, raising questions about their potential involvement in mi...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Llms

Does Anthropic think Claude is alive? Define ‘alive’ | The Verge

Anthropic executives suggest that their AI model, Claude, may possess a form of consciousness, sparking debates about the implications of...

The Verge - AI · 10 min · about 1 month ago

Llms

[R] Systematic Vulnerability in Open-Weight LLMs: Prefill Attacks Achieve Near-Perfect Success Rates Across 50 Models

This article presents a comprehensive study on prefill attacks in open-weight LLMs, revealing a near-perfect success rate across 50 model...

Reddit - Machine Learning · 1 min · about 1 month ago

Ai Safety

EVERYONE IS LYING ABOUT ARTIFICIAL INTELLIGENCE

The article discusses the contradictory narratives surrounding AI in various industries, highlighting how stakeholders often misrepresent...

AI News - General · 3 min · about 1 month ago

Ai Safety

IBM 2026 X-Force Threat Index: AI-Driven Attacks are Escalating as Basic Security Gaps Leave Enterprises Exposed

IBM's 2026 X-Force Threat Intelligence Index reveals a 44% rise in cyberattacks exploiting basic security gaps, driven by AI tools that e...

AI Tools & Products · 5 min · about 1 month ago

Machine Learning

[2511.08261] Uncertainty Calibration of Multi-Label Bird Sound Classifiers

This article evaluates the uncertainty calibration of multi-label bird sound classifiers, highlighting the challenges and improvements in...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2506.04462] Watermarking Degrades Alignment in Language Models: Analysis and Mitigation

This paper analyzes the impact of watermarking on the alignment of language models, revealing significant shifts in model behavior and pr...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 51 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

All Content

[2602.21534] ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

[2602.21496] Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

[2602.21268] A Dynamic Survey of Soft Set Theory and Its Extensions

[D] where can I find more information about NTK wrt Lazy and Rich learning?

How Quickly Will A.I. Agents Rip Through the Economy?

We built a cryptographic authorization gateway for AI agents and planning to run limited red-team sessions

The White House wants AI companies to cover rate hikes. Most have already said they would. | TechCrunch

[D] Is ICLR not giving Spotlights this year?

The public opposition to AI infrastructure is heating up | TechCrunch

[D] Is it possible to create a benchmark that can measure human-like intelligence?

About 12% of U.S. teens turn to AI for emotional support or advice | TechCrunch

[D] How can you tell if a paper was heavily written with the help of LLM?

[D] The Anthropic–Pentagon situation isn’t political. It’s architectural.

Anthropic and the Pentagon saga: the deadline approaches.

Does Anthropic think Claude is alive? Define ‘alive’ | The Verge

[R] Systematic Vulnerability in Open-Weight LLMs: Prefill Attacks Achieve Near-Perfect Success Rates Across 50 Models

EVERYONE IS LYING ABOUT ARTIFICIAL INTELLIGENCE

IBM 2026 X-Force Threat Index: AI-Driven Attacks are Escalating as Basic Security Gaps Leave Enterprises Exposed

[2511.08261] Uncertainty Calibration of Multi-Label Bird Sound Classifiers

[2506.04462] Watermarking Degrades Alignment in Language Models: Analysis and Mitigation

Related Topics

Stay updated with AI News