AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min · about 13 hours ago

Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min · about 17 hours ago

All Content

Ai Safety

Pentagon threatens to cut off Anthropic in AI safeguards dispute: Report

The Pentagon is threatening to sever ties with AI company Anthropic due to its refusal to allow unrestricted military use of its AI model...

AI Tools & Products · 2 min · about 2 months ago

Ai Safety

Fraudulent AI Assistants Target User Information

A wave of malicious browser extensions masquerading as AI assistants has emerged on Google’s Chrome web store, stealing users' personal i...

AI Tools & Products · 4 min · about 2 months ago

Generative Ai

ByteDance to curb AI video app after Disney legal threat

ByteDance is set to limit its AI video app Seedance after Disney's legal threats over copyright infringement involving its characters, in...

AI Tools & Products · 3 min · about 2 months ago

Ai Safety

Attorneys warn against using AI to file taxes

Experts caution against using AI for tax filing, highlighting risks of errors and privacy concerns. Taxpayers could face penalties for re...

AI Tools & Products · 7 min · about 2 months ago

Ai Safety

UK Cracks Down on AI Chatbots With Grok Enforcement

The UK government has enforced regulations on the Grok AI chatbot, signaling stricter compliance with the Online Safety Act to protect ch...

AI Tools & Products · 6 min · about 2 months ago

Generative Ai

Federal Judge Rules AI Chatbot Conversations Can Be Seized as Evidence in Fraud Cases

A federal judge ruled that conversations with AI chatbots like Claude do not have the same legal protections as those with attorneys, imp...

AI Tools & Products · 5 min · about 2 months ago

Ai Safety

AI: ‘The machines don’t think’

Temese Szalai's talk at the Astoria Public Library demystifies AI, focusing on its workings and limitations rather than usage, aiming to ...

AI Tools & Products · 5 min · about 2 months ago

Ai Safety

Pentagon ‘close to cutting ties’ with AI firm Anthropic over restrictions

The Pentagon is considering severing ties with AI firm Anthropic due to disagreements over restrictions on the use of its Claude AI tool,...

AI Tools & Products · 2 min · about 2 months ago

Ai Infrastructure

Could India challenge tech boss power at Delhi AI Impact Summit?

The AI Impact Summit in Delhi highlights India's potential to reshape the global AI landscape, emphasizing the need for inclusivity and l...

AI Tools & Products · 6 min · about 2 months ago

Llms

I love Claude but honestly some of the "Claude might have gained consciousness" nonsense that their marketing team is pushing lately is a bit off putting. They know better!

The article discusses concerns over Anthropic's marketing claims regarding Claude's potential consciousness, highlighting skepticism from...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Ai Safety

Pentagon threatens Anthropic punishment

The Pentagon has issued a warning to Anthropic regarding potential punitive actions, highlighting concerns over AI safety and regulatory ...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Machine Learning

Collaboration invite - medical Imag!ng, algorithmic fairness or open track [D]

A 2nd year PhD student seeks collaboration opportunities in medical imaging and algorithmic fairness, inviting community members to conne...

Reddit - Machine Learning · 1 min · about 2 months ago

Robotics

[D] We found 18K+ exposed OpenClaw instances and ~15% of community skills contain malicious instructionsc

A security audit reveals over 18,000 exposed OpenClaw instances and alarming findings of malicious instructions in 15% of community-built...

Reddit - Machine Learning · 1 min · about 2 months ago

Ai Safety

Ars Technica hallucinated quotes in its story about hallucinations

The article discusses how Ars Technica inaccurately reported quotes regarding AI hallucinations, raising concerns about media accuracy in...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Llms

Is alignment missing a dataset that no one has built yet?

The article discusses the absence of a dataset that captures the unique nuances of human identity, which are not reflected in existing la...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Ai Safety

AI chatbots to face strict online safety rules in UK

The UK is set to implement strict online safety regulations for AI chatbots, aiming to enhance user protection and accountability in digi...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Machine Learning

Izwi Update: Local Speaker Diarization, Forced Alignment, and better model support

Izwi has released significant updates, including local speaker diarization, forced alignment for accurate timestamps, and real-time strea...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Ai Safety

AI Can't Handle Human Kink

The article discusses the limitations of AI in understanding and managing human kinks, highlighting the complexities of human sexuality t...

Reddit - Artificial Intelligence · 1 min · about 2 months ago

Ai Safety

Let’s talk about Ring, lost dogs, and the surveillance state | The Verge

The Verge discusses the backlash against Ring's Search Party feature, which raises concerns about privacy and surveillance following its ...

The Verge - AI · 24 min · about 2 months ago

Ai Agents

After all the hype, some AI experts don't think OpenClaw is all that exciting | TechCrunch

The article critiques OpenClaw, an AI project, highlighting skepticism from experts regarding its novelty and security flaws, particularl...

TechCrunch - AI · 10 min · about 2 months ago

Previous Page 116 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

[P] If you're building AI agents, logs aren't enough. You need evidence.

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

All Content

Pentagon threatens to cut off Anthropic in AI safeguards dispute: Report

Fraudulent AI Assistants Target User Information

ByteDance to curb AI video app after Disney legal threat

Attorneys warn against using AI to file taxes

UK Cracks Down on AI Chatbots With Grok Enforcement

Federal Judge Rules AI Chatbot Conversations Can Be Seized as Evidence in Fraud Cases

AI: ‘The machines don’t think’

Pentagon ‘close to cutting ties’ with AI firm Anthropic over restrictions

Could India challenge tech boss power at Delhi AI Impact Summit?

I love Claude but honestly some of the "Claude might have gained consciousness" nonsense that their marketing team is pushing lately is a bit off putting. They know better!

Pentagon threatens Anthropic punishment

Collaboration invite - medical Imag!ng, algorithmic fairness or open track [D]

[D] We found 18K+ exposed OpenClaw instances and ~15% of community skills contain malicious instructionsc

Ars Technica hallucinated quotes in its story about hallucinations

Is alignment missing a dataset that no one has built yet?

AI chatbots to face strict online safety rules in UK

Izwi Update: Local Speaker Diarization, Forced Alignment, and better model support

AI Can't Handle Human Kink

Let’s talk about Ring, lost dogs, and the surveillance state | The Verge

After all the hype, some AI experts don't think OpenClaw is all that exciting | TechCrunch

Related Topics

Stay updated with AI News