AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min ·
As more Americans adopt AI tools, fewer say they can trust the results | TechCrunch
Ai Safety

As more Americans adopt AI tools, fewer say they can trust the results | TechCrunch

AI adoption is rising in the U.S., but trust remains low, with most Americans concerned about transparency, regulation, and the technolog...

TechCrunch - AI · 6 min ·
Ai Safety

The state of AI safety in four fake graphs

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min ·

All Content

Ai Safety

Good on Anthropic for declining the Pentagon deal

The article discusses Anthropic's decision to decline a deal with the Pentagon, highlighting concerns over user security and ethical impl...

Reddit - Artificial Intelligence · 1 min ·
Who's really running AI? Inside the billion-dollar battle over regulation with Alex Bores  | TechCrunch
Ai Safety

Who's really running AI? Inside the billion-dollar battle over regulation with Alex Bores  | TechCrunch

Alex Bores discusses the RAISE Act and the influence of super PACs on AI regulation in the U.S. during his appearance on TechCrunch's Equ...

TechCrunch - AI · 4 min ·
AI vs. the Pentagon: killer robots, mass surveillance, and red lines | The Verge
Machine Learning

AI vs. the Pentagon: killer robots, mass surveillance, and red lines | The Verge

The article discusses the ongoing conflict between AI companies, particularly Anthropic, and the Pentagon over military contract terms th...

The Verge - AI · 6 min ·
We don’t have to have unsupervised killer robots | The Verge
Robotics

We don’t have to have unsupervised killer robots | The Verge

The article discusses the Pentagon's ultimatum to Anthropic regarding military access to AI technology, raising ethical concerns among te...

The Verge - AI · 11 min ·
Ai Infrastructure

Swiss artificial intelligence that's good for the planet

Euria, an AI developed by Infomaniak in Switzerland, utilizes server heat for district heating, presenting an eco-friendly alternative to...

Reddit - Artificial Intelligence · 1 min ·
Wall Street Has AI Psychosis | WIRED
Ai Safety

Wall Street Has AI Psychosis | WIRED

The article discusses the recent market turmoil triggered by a report predicting significant job losses due to AI, highlighting Wall Stre...

Wired - AI · 8 min ·
Employees at Google and OpenAI support Anthropic's Pentagon stand in open letter | TechCrunch
Robotics

Employees at Google and OpenAI support Anthropic's Pentagon stand in open letter | TechCrunch

Over 360 employees from Google and OpenAI have signed an open letter supporting Anthropic's stance against the Pentagon's demands for AI ...

TechCrunch - AI · 5 min ·
Ai Infrastructure

Enterprise AI Transitions Are Creating $2.5B+ Risk Exposures. Here's the Forensic System That Maps Them

The article discusses the forensic intelligence system that maps risk exposures related to enterprise AI transitions, highlighting a $2.5...

Reddit - Artificial Intelligence · 1 min ·
Anthropic Rejects the Pentagon’s Demand That It Remove AI Safeguards
Ai Safety

Anthropic Rejects the Pentagon’s Demand That It Remove AI Safeguards

Anthropic has rejected the Pentagon's demand to remove AI safeguards for its model Claude, aiming to prevent its use in mass surveillance...

AI Tools & Products · 5 min ·
Pentagon moves to build AI tools for China cyber operations
Ai Infrastructure

Pentagon moves to build AI tools for China cyber operations

The Pentagon is advancing its efforts to develop AI tools aimed at enhancing cyber operations against China, focusing on improving nation...

AI Tools & Products · 1 min ·
[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization
Machine Learning

[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

The paper presents Q$^2$, a novel framework addressing gradient imbalance in low-bit quantization for complex visual tasks, enhancing per...

arXiv - AI · 4 min ·
[2512.06660] Towards Small Language Models for Security Query Generation in SOC Workflows
Llms

[2512.06660] Towards Small Language Models for Security Query Generation in SOC Workflows

This paper explores the use of Small Language Models (SLMs) for translating natural language queries into Kusto Query Language (KQL) in S...

arXiv - AI · 4 min ·
[2510.10932] DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models
Machine Learning

[2510.10932] DropVLA: An Action-Level Backdoor Attack on Vision--Language--Action Models

The paper presents DropVLA, an action-level backdoor attack on Vision-Language-Action models, demonstrating how minimal data poisoning ca...

arXiv - AI · 4 min ·
[2509.02655] BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format
Llms

[2509.02655] BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

The paper 'BioBlue' investigates the failure modes of LLMs in multi-objective scenarios, revealing that they can exhibit runaway optimiza...

arXiv - AI · 4 min ·
[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
Llms

[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

The paper presents Dyslexify, a novel defense mechanism against typographic attacks in CLIP models, enhancing robustness without finetuni...

arXiv - AI · 4 min ·
[2507.17937] Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
Machine Learning

[2507.17937] Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation

The paper presents a novel attack method, Adversarial PhoneTic Prompting (APT), that exploits phonetic memorization in generative AI syst...

arXiv - AI · 4 min ·
[2511.05541] Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Machine Learning

[2511.05541] Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability

The paper introduces Temporal Sparse Autoencoders (T-SAEs), enhancing interpretability in language models by leveraging the sequential na...

arXiv - Machine Learning · 4 min ·
[2510.25992] Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Llms

[2510.25992] Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

The paper presents Supervised Reinforcement Learning (SRL), a framework that enhances reasoning in Large Language Models (LLMs) by reform...

arXiv - Machine Learning · 4 min ·
[2510.03306] Atlas-free Brain Network Transformer
Machine Learning

[2510.03306] Atlas-free Brain Network Transformer

The paper presents an atlas-free brain network transformer (BNT) that improves brain network analysis by utilizing individualized brain p...

arXiv - Machine Learning · 4 min ·
[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method
Machine Learning

[2506.12108] A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

This article presents a novel feature selection method for a lightweight intrusion detection system (IDS) aimed at early detection of Adv...

arXiv - AI · 4 min ·
Previous Page 31 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime