AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min ·
[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis
Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min ·

All Content

[2602.13275] Artificial Organisations
Machine Learning

[2602.13275] Artificial Organisations

The paper 'Artificial Organisations' explores how multi-agent AI systems can achieve reliable outcomes through architectural design, draw...

arXiv - AI · 3 min ·
[2602.13274] ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs
Llms

[2602.13274] ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

The paper introduces ProMoral-Bench, a benchmark for evaluating prompting strategies in large language models (LLMs) focused on moral rea...

arXiv - AI · 3 min ·
[2602.13271] Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework
Machine Learning

[2602.13271] Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework

This paper presents a novel intrusion detection framework that integrates Explainable AI (XAI) to enhance the interpretability and perfor...

arXiv - Machine Learning · 4 min ·
[2602.13262] General learned delegation by clones
Llms

[2602.13262] General learned delegation by clones

The paper presents SELFCEST, a novel approach that enhances language models by enabling them to create clones for improved reasoning effi...

arXiv - AI · 3 min ·
[2602.13248] X-Blocks: Linguistic Building Blocks of Natural Language Explanations for Automated Vehicles
Nlp

[2602.13248] X-Blocks: Linguistic Building Blocks of Natural Language Explanations for Automated Vehicles

The paper introduces X-Blocks, a framework for analyzing natural language explanations in automated vehicles, enhancing user trust and un...

arXiv - AI · 4 min ·
[2602.13255] DPBench: Large Language Models Struggle with Simultaneous Coordination
Llms

[2602.13255] DPBench: Large Language Models Struggle with Simultaneous Coordination

The paper introduces DPBench, a benchmark assessing how well large language models (LLMs) coordinate in multi-agent systems, revealing si...

arXiv - AI · 3 min ·
[2602.13240] AST-PAC: AST-guided Membership Inference for Code
Llms

[2602.13240] AST-PAC: AST-guided Membership Inference for Code

The paper introduces AST-PAC, a novel method for membership inference attacks on code models, leveraging Abstract Syntax Trees to enhance...

arXiv - AI · 3 min ·
[2602.13237] NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models
Llms

[2602.13237] NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models

NL2LOGIC presents a novel framework for translating natural language into first-order logic using large language models, enhancing accura...

arXiv - AI · 4 min ·
[2602.13234] Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents
Llms

[2602.13234] Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents

The paper presents a novel framework, Dual-Cycle Adversarial Self-Evolution, aimed at enhancing the safety and fidelity of role-playing a...

arXiv - AI · 4 min ·
[2602.13230] Intelligence as Trajectory-Dominant Pareto Optimization
Machine Learning

[2602.13230] Intelligence as Trajectory-Dominant Pareto Optimization

The paper presents a novel framework for understanding intelligence through the lens of trajectory-dominant Pareto optimization, addressi...

arXiv - Machine Learning · 4 min ·
[2602.13224] A Geometric Taxonomy of Hallucinations in LLMs
Llms

[2602.13224] A Geometric Taxonomy of Hallucinations in LLMs

This article presents a geometric taxonomy of hallucinations in large language models (LLMs), categorizing them into three types: unfaith...

arXiv - AI · 3 min ·
[2602.13217] VeRA: Verified Reasoning Data Augmentation at Scale
Ai Startups

[2602.13217] VeRA: Verified Reasoning Data Augmentation at Scale

VeRA introduces a framework for generating verified reasoning data at scale, enhancing AI evaluation by creating dynamic, executable benc...

arXiv - AI · 4 min ·
[2602.13213] Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique
Ai Agents

[2602.13213] Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique

This paper presents an agentic AI system for commercial insurance underwriting that incorporates adversarial self-critique to enhance dec...

arXiv - Machine Learning · 4 min ·
Columbia’s Orfanidou says training AI using biased past risks excluding women from shipping
Machine Learning

Columbia’s Orfanidou says training AI using biased past risks excluding women from shipping

Christina Orfanidou, head of AI at Columbia Group, warns that using biased historical data to train AI could perpetuate gender exclusion ...

AI News - General · 1 min ·
The Small English Town Swept Up in the Global AI Arms Race | WIRED
Ai Infrastructure

The Small English Town Swept Up in the Global AI Arms Race | WIRED

Residents of Potters Bar protest against a planned data center on green belt land, highlighting tensions between AI infrastructure demand...

Wired - AI · 11 min ·
Generative Ai

[D] Does humanity need generative AI?

The discussion explores the necessity of generative AI, questioning its benefits beyond business applications and highlighting its potent...

Reddit - Machine Learning · 1 min ·
Anthropic tries to hide Claude's AI actions. Devs hate it
Llms

Anthropic tries to hide Claude's AI actions. Devs hate it

Anthropic's Claude Code update conceals file access details, prompting backlash from developers who rely on this information for effectiv...

AI Tools & Products · 7 min ·
TikTok creator ByteDance vows to curb AI video tool after Disney threat
Generative Ai

TikTok creator ByteDance vows to curb AI video tool after Disney threat

ByteDance's AI video generator Seedance 2.0 faces backlash from Disney and Hollywood for potential copyright infringement, prompting the ...

AI Tools & Products · 4 min ·
Who Owns Ideas? Humans versus AI in Intellectual Property
Generative Ai

Who Owns Ideas? Humans versus AI in Intellectual Property

The article explores the implications of AI in the entertainment industry, focusing on intellectual property rights and the challenges of...

AI Tools & Products · 7 min ·
Reddit's human content wins amid the AI flood
Generative Ai

Reddit's human content wins amid the AI flood

Reddit emphasizes the value of human-generated content as users seek authentic interactions amid a surge of AI-generated material, highli...

AI Tools & Products · 6 min ·
Previous Page 115 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime