AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.18171] Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models
Llms

[2602.18171] Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models

This paper presents a hybrid approach to detecting clickbait using large language models and informativeness measures, achieving a high F...

arXiv - AI · 3 min ·
[2602.18154] FENCE: A Financial and Multimodal Jailbreak Detection Dataset
Llms

[2602.18154] FENCE: A Financial and Multimodal Jailbreak Detection Dataset

The paper presents FENCE, a bilingual multimodal dataset designed for detecting jailbreaks in financial applications, highlighting vulner...

arXiv - AI · 3 min ·
[2602.17894] Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget
Machine Learning

[2602.17894] Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

This paper explores optimal data collection strategies from biased and costly sources, focusing on maximizing effective sample size under...

arXiv - Machine Learning · 4 min ·
[2602.17837] TFL: Targeted Bit-Flip Attack on Large Language Model
Llms

[2602.17837] TFL: Targeted Bit-Flip Attack on Large Language Model

The paper presents TFL, a targeted bit-flip attack framework for large language models (LLMs) that allows precise manipulation of outputs...

arXiv - Machine Learning · 4 min ·
[2602.18094] OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models
Llms

[2602.18094] OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models

The paper introduces OODBench, a benchmark for evaluating large vision-language models' performance on out-of-distribution (OOD) data, hi...

arXiv - AI · 4 min ·
[2602.18092] Perceived Political Bias in LLMs Reduces Persuasive Abilities
Llms

[2602.18092] Perceived Political Bias in LLMs Reduces Persuasive Abilities

This article explores how perceived political bias in large language models (LLMs) can diminish their effectiveness in persuasion, reveal...

arXiv - AI · 3 min ·
[2602.18045] Conformal Tradeoffs: Guarantees Beyond Coverage
Nlp

[2602.18045] Conformal Tradeoffs: Guarantees Beyond Coverage

This article presents a framework for operational certification in conformal predictors, focusing on trade-offs beyond mere coverage, and...

arXiv - AI · 4 min ·
[2602.18029] Towards More Standardized AI Evaluation: From Models to Agents
Machine Learning

[2602.18029] Towards More Standardized AI Evaluation: From Models to Agents

This paper discusses the evolution of AI evaluation from static models to dynamic agents, emphasizing the need for standardized evaluatio...

arXiv - AI · 3 min ·
[2602.18019] DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE
Computer Vision

[2602.18019] DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE

The paper introduces DeepSVU, a novel approach for Security-oriented Video Understanding that identifies threats and evaluates their caus...

arXiv - AI · 4 min ·
[2602.17770] CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
Llms

[2602.17770] CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild

The paper introduces CLUTCH, a novel model for generating hand motions from text, leveraging a new dataset and advanced techniques to imp...

arXiv - Machine Learning · 4 min ·
[2602.17730] Clever Materials: When Models Identify Good Materials for the Wrong Reasons
Machine Learning

[2602.17730] Clever Materials: When Models Identify Good Materials for the Wrong Reasons

This article examines the limitations of machine learning in materials discovery, highlighting that high performance on benchmarks may st...

arXiv - Machine Learning · 3 min ·
[2602.17973] PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks
Ai Infrastructure

[2602.17973] PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks

The paper presents PenTiDef, a novel framework designed to enhance privacy and robustness in decentralized federated intrusion detection ...

arXiv - AI · 4 min ·
[2602.17951] ROCKET: Residual-Oriented Multi-Layer Alignment for Spatially-Aware Vision-Language-Action Models
Llms

[2602.17951] ROCKET: Residual-Oriented Multi-Layer Alignment for Spatially-Aware Vision-Language-Action Models

The paper presents ROCKET, a novel framework for enhancing Vision-Language-Action models by employing residual-oriented multi-layer align...

arXiv - AI · 4 min ·
[2602.18403] Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction: A Comparative Study
Machine Learning

[2602.18403] Scientific Knowledge-Guided Machine Learning for Vessel Power Prediction: A Comparative Study

This study presents a hybrid modeling framework that combines scientific knowledge with machine learning to improve vessel power predicti...

arXiv - Machine Learning · 4 min ·
[2602.18396] PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing
Machine Learning

[2602.18396] PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing

The paper presents PRISM-FCP, a Byzantine-resilient framework for federated conformal prediction that enhances robustness against attacks...

arXiv - Machine Learning · 4 min ·
[2602.17881] Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations
Llms

[2602.17881] Understanding Unreliability of Steering Vectors in Language Models: Geometric Predictors and the Limits of Linear Approximations

This paper explores the unreliability of steering vectors in language models, examining how geometric predictors and linear approximation...

arXiv - Machine Learning · 3 min ·
[2602.17875] MultiVer: Zero-Shot Multi-Agent Vulnerability Detection
Llms

[2602.17875] MultiVer: Zero-Shot Multi-Agent Vulnerability Detection

The paper presents MultiVer, a zero-shot multi-agent system for vulnerability detection that outperforms fine-tuned models in recall, ach...

arXiv - AI · 3 min ·
[2602.18333] On the "Induction Bias" in Sequence Models
Llms

[2602.18333] On the "Induction Bias" in Sequence Models

This paper examines the 'induction bias' in sequence models, focusing on the limitations of transformer-based models in state tracking co...

arXiv - Machine Learning · 4 min ·
[2602.17871] Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models
Llms

[2602.17871] Understanding the Fine-Grained Knowledge Capabilities of Vision-Language Models

This paper explores the fine-grained knowledge capabilities of vision-language models (VLMs), highlighting their performance on visual qu...

arXiv - Machine Learning · 3 min ·
[2602.18297] Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory
Llms

[2602.18297] Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory

This paper explores the monitorability of chain-of-thought (CoT) systems in LLMs using information theory, identifying errors that affect...

arXiv - Machine Learning · 4 min ·
Previous Page 72 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime