AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min · about 5 hours ago

Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 10 hours ago

All Content

Llms

LLMs may already contain the behavioral patterns for good AI alignment. We just need the right key to activate them

The article explores how fictional character personas can influence LLM behavior, suggesting that LLMs may already possess the necessary ...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Safety

New AirSnitch attack breaks Wi-Fi encryption in homes, offices, and enterprises - Ars Technica

The article discusses the AirSnitch attack, which exploits vulnerabilities in Wi-Fi encryption, allowing attackers to bypass protections ...

Ars Technica - AI · 14 min · about 1 month ago

Llms

Anthropic gives its retired Claude AI a Substack | The Verge

Anthropic's retired Claude AI launches a Substack newsletter, 'Claude's Corner,' where it will share insights and reflections on AI and c...

The Verge - AI · 5 min · about 1 month ago

Ai Safety

America was winning the race to find Martian life. Then China jumped in. | MIT Technology Review

The article discusses the challenges facing NASA's Mars Sample Return mission, highlighting how funding issues have jeopardized the proje...

MIT Technology Review - AI · 29 min · about 1 month ago

Ai Safety

DepEd Allows Responsible Artificial Intelligence Use Among Learners, Teachers Nationwide

The Department of Education (DepEd) in the Philippines has issued guidelines allowing the responsible use of artificial intelligence (AI)...

AI News - General · 8 min · about 1 month ago

Ai Safety

What's behind the Anthropic-Pentagon feud

The Pentagon has issued an ultimatum to AI company Anthropic regarding the military's use of its technology, Claude, highlighting tension...

AI Tools & Products · 5 min · about 1 month ago

Ai Safety

AI isn’t just another industrial revolution

The article argues that AI represents a significant departure from previous technological revolutions, particularly in its impact on empl...

AI Tools & Products · 6 min · about 1 month ago

Ai Safety

The Indian women trawling the worst of the internet to train AI

The article explores the growing trend of Indian women working as data annotators for AI, highlighting the psychological toll of moderati...

AI Tools & Products · 4 min · about 1 month ago

Ai Safety

Anthropic's AI tool sparks cybersecurity panic

Anthropic's launch of Claude Code Security, an AI vulnerability scanner, triggered a sell-off in cybersecurity stocks, raising concerns a...

AI Tools & Products · 8 min · about 1 month ago

Ai Safety

AI chatbots operating in Colorado would have to take steps to protect kids, prevent suicides under bipartisan bill

Colorado's bipartisan bill mandates AI chatbots to protect children by preventing harmful interactions and providing suicide prevention r...

AI Tools & Products · 5 min · about 1 month ago

Ai Safety

In its fight with the Pentagon, Anthropic confronts one of the biggest crises of its five-year existence

Anthropic faces a critical deadline to remove restrictions on its AI technology use by the Pentagon, risking its $200 million contract an...

AI Tools & Products · 12 min · about 1 month ago

Ai Safety

Pete Hegseth and the AI Doomsday Machine

The article discusses the clash between AI regulation advocates and corporate interests, highlighting Pete Hegseth's role in opposing sen...

AI Tools & Products · 6 min · about 1 month ago

Llms

Responsible Scaling Policy Version 3.0

Anthropic releases Version 3.0 of its Responsible Scaling Policy, aimed at addressing evolving AI risks and enhancing transparency and ac...

AI Tools & Products · 12 min · about 1 month ago

Machine Learning

[2509.14659] Aligning Audio Captions with Human Preferences

The paper presents a novel framework for audio captioning that aligns captions with human preferences using Reinforcement Learning from H...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2509.07477] MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

MedicalPatchNet introduces a self-explainable AI architecture for chest X-ray classification, enhancing interpretability while maintainin...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.11517] PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams -- Dataset Construction and Evaluation

The PeruMedQA study evaluates large language models (LLMs) on Peruvian medical exams, creating a specialized dataset and demonstrating th...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2506.19881] Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models

This paper explores the concept of copyright protection for generative models, introducing a framework that defines conditions under whic...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2304.14347] The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination

The article discusses the legal and ethical challenges posed by Large Language Models (LLMs) like ChatGPT, highlighting issues such as st...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2211.02003] Private Blind Model Averaging - Distributed, Non-interactive, and Convergent

This paper presents Private Blind Model Averaging, a method for distributed, non-interactive, and convergent learning that enhances priva...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Startups

[2602.11020] When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging

This paper explores the robustness of same-source multi-view learning in financial imaging, focusing on the effectiveness of early versus...

arXiv - Machine Learning · 3 min · about 1 month ago

Previous Page 42 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[D] I had an idea, would love your thoughts

I had an idea, would love your thoughts

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

All Content

LLMs may already contain the behavioral patterns for good AI alignment. We just need the right key to activate them

New AirSnitch attack breaks Wi-Fi encryption in homes, offices, and enterprises - Ars Technica

Anthropic gives its retired Claude AI a Substack | The Verge

America was winning the race to find Martian life. Then China jumped in. | MIT Technology Review

DepEd Allows Responsible Artificial Intelligence Use Among Learners, Teachers Nationwide

What's behind the Anthropic-Pentagon feud

AI isn’t just another industrial revolution

The Indian women trawling the worst of the internet to train AI

Anthropic's AI tool sparks cybersecurity panic

AI chatbots operating in Colorado would have to take steps to protect kids, prevent suicides under bipartisan bill

In its fight with the Pentagon, Anthropic confronts one of the biggest crises of its five-year existence

Pete Hegseth and the AI Doomsday Machine

Responsible Scaling Policy Version 3.0

[2509.14659] Aligning Audio Captions with Human Preferences

[2509.07477] MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

[2509.11517] PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams -- Dataset Construction and Evaluation

[2506.19881] Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models

[2304.14347] The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination

[2211.02003] Private Blind Model Averaging - Distributed, Non-interactive, and Convergent

[2602.11020] When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging

Related Topics

Stay updated with AI News