AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Ai Safety

As more Americans adopt AI tools, fewer say they can trust the results | TechCrunch

AI adoption is rising in the U.S., but trust remains low, with most Americans concerned about transparency, regulation, and the technolog...

TechCrunch - AI · 6 min · about 4 hours ago

Ai Safety

The state of AI safety in four fake graphs

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

All Content

Ai Safety

Good on Anthropic for declining the Pentagon deal

The article discusses Anthropic's decision to decline a deal with the Pentagon, highlighting concerns over user security and ethical impl...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Safety

Who's really running AI? Inside the billion-dollar battle over regulation with Alex Bores | TechCrunch

Alex Bores discusses the RAISE Act and the influence of super PACs on AI regulation in the U.S. during his appearance on TechCrunch's Equ...

TechCrunch - AI · 4 min · about 1 month ago

Machine Learning

AI vs. the Pentagon: killer robots, mass surveillance, and red lines | The Verge

The article discusses the ongoing conflict between AI companies, particularly Anthropic, and the Pentagon over military contract terms th...

The Verge - AI · 6 min · about 1 month ago

Robotics

We don’t have to have unsupervised killer robots | The Verge

The article discusses the Pentagon's ultimatum to Anthropic regarding military access to AI technology, raising ethical concerns among te...

The Verge - AI · 11 min · about 1 month ago

Ai Infrastructure

Swiss artificial intelligence that's good for the planet

Euria, an AI developed by Infomaniak in Switzerland, utilizes server heat for district heating, presenting an eco-friendly alternative to...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Safety

Wall Street Has AI Psychosis | WIRED

The article discusses the recent market turmoil triggered by a report predicting significant job losses due to AI, highlighting Wall Stre...

Wired - AI · 8 min · about 1 month ago

Robotics

Employees at Google and OpenAI support Anthropic's Pentagon stand in open letter | TechCrunch

Over 360 employees from Google and OpenAI have signed an open letter supporting Anthropic's stance against the Pentagon's demands for AI ...

TechCrunch - AI · 5 min · about 1 month ago

Ai Infrastructure

Enterprise AI Transitions Are Creating $2.5B+ Risk Exposures. Here's the Forensic System That Maps Them

The article discusses the forensic intelligence system that maps risk exposures related to enterprise AI transitions, highlighting a $2.5...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Safety

Anthropic Rejects the Pentagon’s Demand That It Remove AI Safeguards

Anthropic has rejected the Pentagon's demand to remove AI safeguards for its model Claude, aiming to prevent its use in mass surveillance...

AI Tools & Products · 5 min · about 1 month ago

Ai Infrastructure

Pentagon moves to build AI tools for China cyber operations

The Pentagon is advancing its efforts to develop AI tools aimed at enhancing cyber operations against China, focusing on improving nation...

AI Tools & Products · 1 min · about 1 month ago

Machine Learning

[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

The paper presents Q$^2$, a novel framework addressing gradient imbalance in low-bit quantization for complex visual tasks, enhancing per...

arXiv - AI · 4 min · about 1 month ago

Llms

[2512.06660] Towards Small Language Models for Security Query Generation in SOC Workflows

This paper explores the use of Small Language Models (SLMs) for translating natural language queries into Kusto Query Language (KQL) in S...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2510.10932] DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

The paper presents DropVLA, an action-level backdoor attack on Vision-Language-Action models, demonstrating how minimal data poisoning ca...

arXiv - AI · 4 min · about 1 month ago

Llms

[2509.02655] BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

The paper 'BioBlue' investigates the failure modes of LLMs in multi-objective scenarios, revealing that they can exhibit runaway optimiza...

arXiv - AI · 4 min · about 1 month ago

Llms

[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

The paper presents Dyslexify, a novel defense mechanism against typographic attacks in CLIP models, enhancing robustness without finetuni...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2507.17937] Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

The paper presents a novel attack method, Adversarial PhoneTic Prompting (APT), that exploits phonetic memorization in generative AI syst...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.05541] Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability

The paper introduces Temporal Sparse Autoencoders (T-SAEs), enhancing interpretability in language models by leveraging the sequential na...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2510.25992] Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

The paper presents Supervised Reinforcement Learning (SRL), a framework that enhances reasoning in Large Language Models (LLMs) by reform...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2510.03306] Atlas-free Brain Network Transformer

The paper presents an atlas-free brain network transformer (BNT) that improves brain network analysis by utilizing individualized brain p...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

This article presents a novel feature selection method for a lightweight intrusion detection system (IDS) aimed at early detection of Adv...

arXiv - AI · 4 min · about 1 month ago

Previous Page 31 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

As more Americans adopt AI tools, fewer say they can trust the results | TechCrunch

The state of AI safety in four fake graphs

All Content

Good on Anthropic for declining the Pentagon deal

Who's really running AI? Inside the billion-dollar battle over regulation with Alex Bores | TechCrunch

AI vs. the Pentagon: killer robots, mass surveillance, and red lines | The Verge

We don’t have to have unsupervised killer robots | The Verge

Swiss artificial intelligence that's good for the planet

Wall Street Has AI Psychosis | WIRED

Employees at Google and OpenAI support Anthropic's Pentagon stand in open letter | TechCrunch

Enterprise AI Transitions Are Creating $2.5B+ Risk Exposures. Here's the Forensic System That Maps Them

Anthropic Rejects the Pentagon’s Demand That It Remove AI Safeguards

Pentagon moves to build AI tools for China cyber operations

[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

[2512.06660] Towards Small Language Models for Security Query Generation in SOC Workflows

[2510.10932] DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

[2509.02655] BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

[2507.17937] Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

[2511.05541] Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability

[2510.25992] Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

[2510.03306] Atlas-free Brain Network Transformer

[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

Related Topics

Stay updated with AI News