AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·
[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·

All Content

Police HQ launches AI-AAC cell
Ai Startups

Police HQ launches AI-AAC cell

The Nepal Police inaugurated an Artificial Intelligence and Advanced Analytics Cell (AI-AAC) to enhance crime investigation and national ...

AI News - General · 4 min ·
Anthropic accuses three Chinese AI labs of abusing Claude to improve their own models
Llms

Anthropic accuses three Chinese AI labs of abusing Claude to improve their own models

Anthropic accuses three Chinese AI labs of conducting distillation attacks on its Claude chatbot, claiming they illicitly extracted capab...

AI Tools & Products · 2 min ·
Anthropic Slams China for AI Theft, But Critics Say the Outrage Is Hypocritical
Nlp

Anthropic Slams China for AI Theft, But Critics Say the Outrage Is Hypocritical

Anthropic accuses Chinese developers of stealing AI secrets from its Claude chatbot, sparking criticism over its own data scraping practi...

AI Tools & Products · 7 min ·
CrowdStrike Reassesses Role As Claude Code Security Shifts AI Risk
Llms

CrowdStrike Reassesses Role As Claude Code Security Shifts AI Risk

CrowdStrike reassesses its position in the cybersecurity landscape following the launch of Anthropic's Claude Code Security, an AI tool t...

AI Tools & Products · 6 min ·
[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation
Generative Ai

[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

The paper introduces PyraTok, a language-aligned pyramidal tokenizer designed to enhance video understanding and generation by improving ...

arXiv - AI · 3 min ·
[2601.14242] APEX-Agents
Llms

[2601.14242] APEX-Agents

The APEX-Agents paper introduces a benchmark for evaluating AI agents' ability to perform complex tasks across various applications, show...

arXiv - Machine Learning · 3 min ·
[2601.05280] On the Limits of Self-Improving in Large Language Models: The Singularity Is Not Near Without Symbolic Model Synthesis
Llms

[2601.05280] On the Limits of Self-Improving in Large Language Models: The Singularity Is Not Near Without Symbolic Model Synthesis

This paper explores the limitations of self-improvement in large language models (LLMs), arguing that without symbolic model synthesis, t...

arXiv - Machine Learning · 4 min ·
[2512.09730] Interpreto: An Explainability Library for Transformers
Llms

[2512.09730] Interpreto: An Explainability Library for Transformers

Interpreto is an open-source library designed for interpreting HuggingFace transformers, offering both attribution methods and concept-ba...

arXiv - Machine Learning · 3 min ·
[2511.20629] MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models
Machine Learning

[2511.20629] MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

The paper presents MapReduce LoRA, a novel framework for optimizing generative models by addressing multi-preference alignment issues. It...

arXiv - Machine Learning · 4 min ·
[2511.19476] FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
Machine Learning

[2511.19476] FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

The paper presents FAST, a novel coreset selection framework that utilizes topology-aware frequency-domain distribution matching, signifi...

arXiv - Machine Learning · 4 min ·
[2511.17621] From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems
Llms

[2511.17621] From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems

This article presents a market-making framework for coordinating multi-agent large language models (LLMs), enhancing trustworthiness and ...

arXiv - AI · 4 min ·
[2511.14478] Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges
Machine Learning

[2511.14478] Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges

This article reviews the state-of-the-art in agentic AI systems within electrical power engineering, providing a taxonomy and practical a...

arXiv - AI · 4 min ·
[2510.12206] Controllable Collision Scenario Generation via Collision Pattern Prediction
Robotics

[2510.12206] Controllable Collision Scenario Generation via Collision Pattern Prediction

This paper introduces a novel method for generating controllable collision scenarios for autonomous vehicles, enhancing safety evaluation...

arXiv - Machine Learning · 4 min ·
[2509.16650] Safe and Near-Optimal Control with Online Dynamics Learning
Ai Infrastructure

[2509.16650] Safe and Near-Optimal Control with Online Dynamics Learning

This article presents a novel approach to safe and near-optimal control in dynamic environments, utilizing online dynamics learning to en...

arXiv - Machine Learning · 4 min ·
[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication
Machine Learning

[2509.11208] Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

The paper discusses the impact of evidence order on the performance of transformers in binary adjudication tasks, introducing metrics to ...

arXiv - Machine Learning · 4 min ·
[2510.14889] Detecting Early and Implicit Suicidal Ideation via Longitudinal and Information Environment Signals on Social Media
Machine Learning

[2510.14889] Detecting Early and Implicit Suicidal Ideation via Longitudinal and Information Environment Signals on Social Media

This article presents a computational framework for detecting early and implicit suicidal ideation on social media by analyzing user inte...

arXiv - AI · 4 min ·
[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
Llms

[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

The paper introduces DITTO, a spoofing attack framework that exploits vulnerabilities in watermarked large language models (LLMs) via kno...

arXiv - AI · 4 min ·
[2510.09312] Verifying Chain-of-Thought Reasoning via Its Computational Graph
Machine Learning

[2510.09312] Verifying Chain-of-Thought Reasoning via Its Computational Graph

The paper presents a novel method for verifying Chain-of-Thought (CoT) reasoning in AI models using Circuit-based Reasoning Verification ...

arXiv - Machine Learning · 4 min ·
[2507.11891] Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
Ai Safety

[2507.11891] Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

This paper explores the impact of data sharing on A/B experiments in recommendation systems, focusing on how interference affects algorit...

arXiv - Machine Learning · 4 min ·
[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Llms

[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

The paper introduces SocialHarmBench, a dataset designed to evaluate the vulnerabilities of large language models (LLMs) to socially harm...

arXiv - Machine Learning · 4 min ·
Previous Page 58 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime