AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 19 hours ago

All Content

Ai Startups

Police HQ launches AI-AAC cell

The Nepal Police inaugurated an Artificial Intelligence and Advanced Analytics Cell (AI-AAC) to enhance crime investigation and national ...

AI News - General · 4 min · about 1 month ago

Llms

Anthropic accuses three Chinese AI labs of abusing Claude to improve their own models

Anthropic accuses three Chinese AI labs of conducting distillation attacks on its Claude chatbot, claiming they illicitly extracted capab...

AI Tools & Products · 2 min · about 1 month ago

Nlp

Anthropic Slams China for AI Theft, But Critics Say the Outrage Is Hypocritical

Anthropic accuses Chinese developers of stealing AI secrets from its Claude chatbot, sparking criticism over its own data scraping practi...

AI Tools & Products · 7 min · about 1 month ago

Llms

CrowdStrike Reassesses Role As Claude Code Security Shifts AI Risk

CrowdStrike reassesses its position in the cybersecurity landscape following the launch of Anthropic's Claude Code Security, an AI tool t...

AI Tools & Products · 6 min · about 1 month ago

Generative Ai

[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

The paper introduces PyraTok, a language-aligned pyramidal tokenizer designed to enhance video understanding and generation by improving ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2601.14242] APEX-Agents

The APEX-Agents paper introduces a benchmark for evaluating AI agents' ability to perform complex tasks across various applications, show...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2601.05280] On the Limits of Self-Improving in Large Language Models: The Singularity Is Not Near Without Symbolic Model Synthesis

This paper explores the limitations of self-improvement in large language models (LLMs), arguing that without symbolic model synthesis, t...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2512.09730] Interpreto: An Explainability Library for Transformers

Interpreto is an open-source library designed for interpreting HuggingFace transformers, offering both attribution methods and concept-ba...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2511.20629] MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

The paper presents MapReduce LoRA, a novel framework for optimizing generative models by addressing multi-preference alignment issues. It...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2511.19476] FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

The paper presents FAST, a novel coreset selection framework that utilizes topology-aware frequency-domain distribution matching, signifi...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2511.17621] From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems

This article presents a market-making framework for coordinating multi-agent large language models (LLMs), enhancing trustworthiness and ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.14478] Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges

This article reviews the state-of-the-art in agentic AI systems within electrical power engineering, providing a taxonomy and practical a...

arXiv - AI · 4 min · about 1 month ago

Robotics

[2510.12206] Controllable Collision Scenario Generation via Collision Pattern Prediction

This paper introduces a novel method for generating controllable collision scenarios for autonomous vehicles, enhancing safety evaluation...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Infrastructure

[2509.16650] Safe and Near-Optimal Control with Online Dynamics Learning

This article presents a novel approach to safe and near-optimal control in dynamic environments, utilizing online dynamics learning to en...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

The paper discusses the impact of evidence order on the performance of transformers in binary adjudication tasks, introducing metrics to ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2510.14889] Detecting Early and Implicit Suicidal Ideation via Longitudinal and Information Environment Signals on Social Media

This article presents a computational framework for detecting early and implicit suicidal ideation on social media by analyzing user inte...

arXiv - AI · 4 min · about 1 month ago

Llms

[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

The paper introduces DITTO, a spoofing attack framework that exploits vulnerabilities in watermarked large language models (LLMs) via kno...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2510.09312] Verifying Chain-of-Thought Reasoning via Its Computational Graph

The paper presents a novel method for verifying Chain-of-Thought (CoT) reasoning in AI models using Circuit-based Reasoning Verification ...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2507.11891] Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

This paper explores the impact of data sharing on A/B experiments in recommendation systems, focusing on how interference affects algorit...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

The paper introduces SocialHarmBench, a dataset designed to evaluate the vulnerabilities of large language models (LLMs) to socially harm...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 58 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

[2512.21106] Semantic Refinement with LLMs for Graph Representations

All Content

Police HQ launches AI-AAC cell

Anthropic accuses three Chinese AI labs of abusing Claude to improve their own models

Anthropic Slams China for AI Theft, But Critics Say the Outrage Is Hypocritical

CrowdStrike Reassesses Role As Claude Code Security Shifts AI Risk

[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

[2601.14242] APEX-Agents

[2601.05280] On the Limits of Self-Improving in Large Language Models: The Singularity Is Not Near Without Symbolic Model Synthesis

[2512.09730] Interpreto: An Explainability Library for Transformers

[2511.20629] MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

[2511.19476] FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

[2511.17621] From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems

[2511.14478] Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges

[2510.12206] Controllable Collision Scenario Generation via Collision Pattern Prediction

[2509.16650] Safe and Near-Optimal Control with Online Dynamics Learning

[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

[2510.14889] Detecting Early and Implicit Suicidal Ideation via Longitudinal and Information Environment Signals on Social Media

[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

[2510.09312] Verifying Chain-of-Thought Reasoning via Its Computational Graph

[2507.11891] Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

Related Topics

Stay updated with AI News