AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 10 hours ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · about 13 hours ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · about 13 hours ago

All Content

Llms

[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

The paper presents LA-LoRA, a novel approach for fine-tuning large models in privacy-preserving federated learning, addressing key challe...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18800] Operational Robustness of LLMs on Code Generation

This article evaluates the operational robustness of large language models (LLMs) in code generation, proposing a new method to assess th...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.18782] MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

The paper presents MANATEE, a novel defense mechanism for large language models (LLMs) against adversarial attacks, utilizing a lightweig...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2602.19893] Generalized Random Direction Newton Algorithms for Stochastic Optimization

This paper introduces generalized Hessian estimators for stochastic optimization using random direction stochastic approximation, demonst...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

The paper introduces TAG, a vision-language framework for Facial Expression Recognition (FER) that enhances reasoning by grounding predic...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.19789] Stop Preaching and Start Practising Data Frugality for Responsible Development of AI

This paper advocates for the machine learning community to adopt data frugality in AI development, emphasizing its environmental benefits...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18758] UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization

The paper presents UFO, a quantized two-party computation framework that optimizes private CNN inference by combining efficient protocols...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.19770] The Confusion is Real: GRAPHIC - A Network Science Approach to Confusion Matrices in Deep Learning

The paper presents GRAPHIC, a novel approach using network science to analyze confusion matrices in deep learning, enhancing understandin...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval

This article evaluates how data anonymization affects the performance of Content-Based Image Retrieval (CBIR) systems, highlighting the b...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19582] Advantage-based Temporal Attack in Reinforcement Learning

This article presents the Advantage-based Adversarial Transformer (AAT), a novel method for generating time-correlated adversarial exampl...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models

Luna-2 introduces a scalable architecture for single-token evaluation using small language models, enhancing accuracy and reducing costs ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18535] Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS

This paper presents a novel framework for voice classification of Parkinson's and ALS using fairness-aware partial-label domain adaptatio...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.19498] Softmax is not Enough (for Adaptive Conformal Classification)

The paper critiques the reliance on softmax outputs in adaptive conformal classification, proposing a new method that utilizes pre-softma...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

This article presents a case study on the security implications of Indirect Prompt Injection (IPI) in Large Language Models (LLMs) used i...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.19392] Spiking Graph Predictive Coding for Reliable OOD Generalization

The paper introduces Spiking Graph Predictive Coding (SIGHT), a novel approach to enhance out-of-distribution (OOD) generalization in gra...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.19373] Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

This paper presents a method for enhancing stability in deep reinforcement learning by utilizing isotropic Gaussian representations, addr...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.18492] Vibe Coding on Trial: Operating Characteristics of Unanimous LLM Juries

The paper explores the effectiveness of unanimous committees of Large Language Models (LLMs) in evaluating SQL queries, revealing insight...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

The article examines red teaming as a socio-technical practice in evaluating large language models (LLMs), highlighting the importance of...

arXiv - AI · 4 min · about 1 month ago

Previous Page 65 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

[2602.18800] Operational Robustness of LLMs on Code Generation

[2602.18782] MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

[2602.19893] Generalized Random Direction Newton Algorithms for Stochastic Optimization

[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

[2602.19789] Stop Preaching and Start Practising Data Frugality for Responsible Development of AI

[2602.18758] UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

[2602.19770] The Confusion is Real: GRAPHIC - A Network Science Approach to Confusion Matrices in Deep Learning

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval

[2602.19582] Advantage-based Temporal Attack in Reinforcement Learning

[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models

[2602.18535] Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS

[2602.19498] Softmax is not Enough (for Adaptive Conformal Classification)

[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

[2602.19392] Spiking Graph Predictive Coding for Reliable OOD Generalization

[2602.19373] Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

[2602.18492] Vibe Coding on Trial: Operating Characteristics of Unanimous LLM Juries

[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

Related Topics

Stay updated with AI News