AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min · about 11 hours ago

Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min · about 15 hours ago

All Content

Machine Learning

[2602.13275] Artificial Organisations

The paper 'Artificial Organisations' explores how multi-agent AI systems can achieve reliable outcomes through architectural design, draw...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13274] ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

The paper introduces ProMoral-Bench, a benchmark for evaluating prompting strategies in large language models (LLMs) focused on moral rea...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.13271] Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework

This paper presents a novel intrusion detection framework that integrates Explainable AI (XAI) to enhance the interpretability and perfor...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.13262] General learned delegation by clones

The paper presents SELFCEST, a novel approach that enhances language models by enabling them to create clones for improved reasoning effi...

arXiv - AI · 3 min · about 2 months ago

Nlp

[2602.13248] X-Blocks: Linguistic Building Blocks of Natural Language Explanations for Automated Vehicles

The paper introduces X-Blocks, a framework for analyzing natural language explanations in automated vehicles, enhancing user trust and un...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13255] DPBench: Large Language Models Struggle with Simultaneous Coordination

The paper introduces DPBench, a benchmark assessing how well large language models (LLMs) coordinate in multi-agent systems, revealing si...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13240] AST-PAC: AST-guided Membership Inference for Code

The paper introduces AST-PAC, a novel method for membership inference attacks on code models, leveraging Abstract Syntax Trees to enhance...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13237] NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models

NL2LOGIC presents a novel framework for translating natural language into first-order logic using large language models, enhancing accura...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13234] Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents

The paper presents a novel framework, Dual-Cycle Adversarial Self-Evolution, aimed at enhancing the safety and fidelity of role-playing a...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13230] Intelligence as Trajectory-Dominant Pareto Optimization

The paper presents a novel framework for understanding intelligence through the lens of trajectory-dominant Pareto optimization, addressi...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.13224] A Geometric Taxonomy of Hallucinations in LLMs

This article presents a geometric taxonomy of hallucinations in large language models (LLMs), categorizing them into three types: unfaith...

arXiv - AI · 3 min · about 2 months ago

Ai Startups

[2602.13217] VeRA: Verified Reasoning Data Augmentation at Scale

VeRA introduces a framework for generating verified reasoning data at scale, enhancing AI evaluation by creating dynamic, executable benc...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.13213] Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique

This paper presents an agentic AI system for commercial insurance underwriting that incorporates adversarial self-critique to enhance dec...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

Columbia’s Orfanidou says training AI using biased past risks excluding women from shipping

Christina Orfanidou, head of AI at Columbia Group, warns that using biased historical data to train AI could perpetuate gender exclusion ...

AI News - General · 1 min · about 2 months ago

Ai Infrastructure

The Small English Town Swept Up in the Global AI Arms Race | WIRED

Residents of Potters Bar protest against a planned data center on green belt land, highlighting tensions between AI infrastructure demand...

Wired - AI · 11 min · about 2 months ago

Generative Ai

[D] Does humanity need generative AI?

The discussion explores the necessity of generative AI, questioning its benefits beyond business applications and highlighting its potent...

Reddit - Machine Learning · 1 min · about 2 months ago

Llms

Anthropic tries to hide Claude's AI actions. Devs hate it

Anthropic's Claude Code update conceals file access details, prompting backlash from developers who rely on this information for effectiv...

AI Tools & Products · 7 min · about 2 months ago

Generative Ai

TikTok creator ByteDance vows to curb AI video tool after Disney threat

ByteDance's AI video generator Seedance 2.0 faces backlash from Disney and Hollywood for potential copyright infringement, prompting the ...

AI Tools & Products · 4 min · about 2 months ago

Generative Ai

Who Owns Ideas? Humans versus AI in Intellectual Property

The article explores the implications of AI in the entertainment industry, focusing on intellectual property rights and the challenges of...

AI Tools & Products · 7 min · about 2 months ago

Generative Ai

Reddit's human content wins amid the AI flood

Reddit emphasizes the value of human-generated content as users seek authentic interactions amid a surge of AI-generated material, highli...

AI Tools & Products · 6 min · about 2 months ago

Previous Page 115 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

[P] If you're building AI agents, logs aren't enough. You need evidence.

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

All Content

[2602.13275] Artificial Organisations

[2602.13274] ProMoral-Bench: Evaluating Prompting Strategies for Moral Reasoning and Safety in LLMs

[2602.13271] Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework

[2602.13262] General learned delegation by clones

[2602.13248] X-Blocks: Linguistic Building Blocks of Natural Language Explanations for Automated Vehicles

[2602.13255] DPBench: Large Language Models Struggle with Simultaneous Coordination

[2602.13240] AST-PAC: AST-guided Membership Inference for Code

[2602.13237] NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models

[2602.13234] Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents

[2602.13230] Intelligence as Trajectory-Dominant Pareto Optimization

[2602.13224] A Geometric Taxonomy of Hallucinations in LLMs

[2602.13217] VeRA: Verified Reasoning Data Augmentation at Scale

[2602.13213] Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique

Columbia’s Orfanidou says training AI using biased past risks excluding women from shipping

The Small English Town Swept Up in the Global AI Arms Race | WIRED

[D] Does humanity need generative AI?

Anthropic tries to hide Claude's AI actions. Devs hate it

TikTok creator ByteDance vows to curb AI video tool after Disney threat

Who Owns Ideas? Humans versus AI in Intellectual Property

Reddit's human content wins amid the AI flood

Related Topics

Stay updated with AI News