AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] If you're building AI agents, logs aren't enough. You need evidence.

I have built a programmable governance layer for AI agents. I am considering to open source completely. Looking for feedback. Agent demos...

Reddit - Machine Learning · 1 min ·
[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis
Ai Safety

[2510.14628] RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Abstract page for arXiv paper 2510.14628: RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

arXiv - AI · 4 min ·

All Content

Pentagon threatens to cut off Anthropic in AI safeguards dispute: Report
Ai Safety

Pentagon threatens to cut off Anthropic in AI safeguards dispute: Report

The Pentagon is threatening to sever ties with AI company Anthropic due to its refusal to allow unrestricted military use of its AI model...

AI Tools & Products · 2 min ·
Fraudulent AI Assistants Target User Information
Ai Safety

Fraudulent AI Assistants Target User Information

A wave of malicious browser extensions masquerading as AI assistants has emerged on Google’s Chrome web store, stealing users' personal i...

AI Tools & Products · 4 min ·
ByteDance to curb AI video app after Disney legal threat
Generative Ai

ByteDance to curb AI video app after Disney legal threat

ByteDance is set to limit its AI video app Seedance after Disney's legal threats over copyright infringement involving its characters, in...

AI Tools & Products · 3 min ·
Attorneys warn against using AI to file taxes
Ai Safety

Attorneys warn against using AI to file taxes

Experts caution against using AI for tax filing, highlighting risks of errors and privacy concerns. Taxpayers could face penalties for re...

AI Tools & Products · 7 min ·
UK Cracks Down on AI Chatbots With Grok Enforcement
Ai Safety

UK Cracks Down on AI Chatbots With Grok Enforcement

The UK government has enforced regulations on the Grok AI chatbot, signaling stricter compliance with the Online Safety Act to protect ch...

AI Tools & Products · 6 min ·
Federal Judge Rules AI Chatbot Conversations Can Be Seized as Evidence in Fraud Cases
Generative Ai

Federal Judge Rules AI Chatbot Conversations Can Be Seized as Evidence in Fraud Cases

A federal judge ruled that conversations with AI chatbots like Claude do not have the same legal protections as those with attorneys, imp...

AI Tools & Products · 5 min ·
AI: ‘The machines don’t think’
Ai Safety

AI: ‘The machines don’t think’

Temese Szalai's talk at the Astoria Public Library demystifies AI, focusing on its workings and limitations rather than usage, aiming to ...

AI Tools & Products · 5 min ·
Pentagon ‘close to cutting ties’ with AI firm Anthropic over restrictions
Ai Safety

Pentagon ‘close to cutting ties’ with AI firm Anthropic over restrictions

The Pentagon is considering severing ties with AI firm Anthropic due to disagreements over restrictions on the use of its Claude AI tool,...

AI Tools & Products · 2 min ·
Could India challenge tech boss power at Delhi AI Impact Summit?
Ai Infrastructure

Could India challenge tech boss power at Delhi AI Impact Summit?

The AI Impact Summit in Delhi highlights India's potential to reshape the global AI landscape, emphasizing the need for inclusivity and l...

AI Tools & Products · 6 min ·
Llms

I love Claude but honestly some of the "Claude might have gained consciousness" nonsense that their marketing team is pushing lately is a bit off putting. They know better!

The article discusses concerns over Anthropic's marketing claims regarding Claude's potential consciousness, highlighting skepticism from...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Pentagon threatens Anthropic punishment

The Pentagon has issued a warning to Anthropic regarding potential punitive actions, highlighting concerns over AI safety and regulatory ...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Collaboration invite - medical Imag!ng, algorithmic fairness or open track [D]

A 2nd year PhD student seeks collaboration opportunities in medical imaging and algorithmic fairness, inviting community members to conne...

Reddit - Machine Learning · 1 min ·
Robotics

[D] We found 18K+ exposed OpenClaw instances and ~15% of community skills contain malicious instructionsc

A security audit reveals over 18,000 exposed OpenClaw instances and alarming findings of malicious instructions in 15% of community-built...

Reddit - Machine Learning · 1 min ·
Ai Safety

Ars Technica hallucinated quotes in its story about hallucinations

The article discusses how Ars Technica inaccurately reported quotes regarding AI hallucinations, raising concerns about media accuracy in...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is alignment missing a dataset that no one has built yet?

The article discusses the absence of a dataset that captures the unique nuances of human identity, which are not reflected in existing la...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

AI chatbots to face strict online safety rules in UK

The UK is set to implement strict online safety regulations for AI chatbots, aiming to enhance user protection and accountability in digi...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Izwi Update: Local Speaker Diarization, Forced Alignment, and better model support

Izwi has released significant updates, including local speaker diarization, forced alignment for accurate timestamps, and real-time strea...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

AI Can't Handle Human Kink

The article discusses the limitations of AI in understanding and managing human kinks, highlighting the complexities of human sexuality t...

Reddit - Artificial Intelligence · 1 min ·
Let’s talk about Ring, lost dogs, and the surveillance state | The Verge
Ai Safety

Let’s talk about Ring, lost dogs, and the surveillance state | The Verge

The Verge discusses the backlash against Ring's Search Party feature, which raises concerns about privacy and surveillance following its ...

The Verge - AI · 24 min ·
After all the hype, some AI experts don't think OpenClaw is all that exciting | TechCrunch
Ai Agents

After all the hype, some AI experts don't think OpenClaw is all that exciting | TechCrunch

The article critiques OpenClaw, an AI project, highlighting skepticism from experts regarding its novelty and security flaws, particularl...

TechCrunch - AI · 10 min ·
Previous Page 116 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime