AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
Llms

[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

The paper presents LA-LoRA, a novel approach for fine-tuning large models in privacy-preserving federated learning, addressing key challe...

arXiv - AI · 4 min ·
[2602.18800] Operational Robustness of LLMs on Code Generation
Llms

[2602.18800] Operational Robustness of LLMs on Code Generation

This article evaluates the operational robustness of large language models (LLMs) in code generation, proposing a new method to assess th...

arXiv - Machine Learning · 4 min ·
[2602.18782] MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs
Llms

[2602.18782] MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

The paper presents MANATEE, a novel defense mechanism for large language models (LLMs) against adversarial attacks, utilizing a lightweig...

arXiv - Machine Learning · 3 min ·
[2602.19893] Generalized Random Direction Newton Algorithms for Stochastic Optimization
Ai Safety

[2602.19893] Generalized Random Direction Newton Algorithms for Stochastic Optimization

This paper introduces generalized Hessian estimators for stochastic optimization using random direction stochastic approximation, demonst...

arXiv - Machine Learning · 3 min ·
[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition
Llms

[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

The paper introduces TAG, a vision-language framework for Facial Expression Recognition (FER) that enhances reasoning by grounding predic...

arXiv - AI · 4 min ·
[2602.19789] Stop Preaching and Start Practising Data Frugality for Responsible Development of AI
Machine Learning

[2602.19789] Stop Preaching and Start Practising Data Frugality for Responsible Development of AI

This paper advocates for the machine learning community to adopt data frugality in AI development, emphasizing its environmental benefits...

arXiv - Machine Learning · 4 min ·
[2602.18758] UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization
Machine Learning

[2602.18758] UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization

The paper presents UFO, a quantized two-party computation framework that optimizes private CNN inference by combining efficient protocols...

arXiv - AI · 4 min ·
[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Llms

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-...

arXiv - AI · 3 min ·
[2602.19770] The Confusion is Real: GRAPHIC - A Network Science Approach to Confusion Matrices in Deep Learning
Machine Learning

[2602.19770] The Confusion is Real: GRAPHIC - A Network Science Approach to Confusion Matrices in Deep Learning

The paper presents GRAPHIC, a novel approach using network science to analyze confusion matrices in deep learning, enhancing understandin...

arXiv - AI · 4 min ·
[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment
Llms

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...

arXiv - AI · 4 min ·
[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval
Nlp

[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval

This article evaluates how data anonymization affects the performance of Content-Based Image Retrieval (CBIR) systems, highlighting the b...

arXiv - Machine Learning · 4 min ·
[2602.19582] Advantage-based Temporal Attack in Reinforcement Learning
Machine Learning

[2602.19582] Advantage-based Temporal Attack in Reinforcement Learning

This article presents the Advantage-based Adversarial Transformer (AAT), a novel method for generating time-correlated adversarial exampl...

arXiv - Machine Learning · 4 min ·
[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models
Llms

[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models

Luna-2 introduces a scalable architecture for single-token evaluation using small language models, enhancing accuracy and reducing costs ...

arXiv - Machine Learning · 4 min ·
[2602.18535] Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS
Machine Learning

[2602.18535] Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS

This paper presents a novel framework for voice classification of Parkinson's and ALS using fairness-aware partial-label domain adaptatio...

arXiv - AI · 4 min ·
[2602.19498] Softmax is not Enough (for Adaptive Conformal Classification)
Nlp

[2602.19498] Softmax is not Enough (for Adaptive Conformal Classification)

The paper critiques the reliance on softmax outputs in adaptive conformal classification, proposing a new method that utilizes pre-softma...

arXiv - AI · 4 min ·
[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models
Llms

[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

This article presents a case study on the security implications of Indirect Prompt Injection (IPI) in Large Language Models (LLMs) used i...

arXiv - AI · 4 min ·
[2602.19392] Spiking Graph Predictive Coding for Reliable OOD Generalization
Machine Learning

[2602.19392] Spiking Graph Predictive Coding for Reliable OOD Generalization

The paper introduces Spiking Graph Predictive Coding (SIGHT), a novel approach to enhance out-of-distribution (OOD) generalization in gra...

arXiv - Machine Learning · 3 min ·
[2602.19373] Stable Deep Reinforcement Learning via Isotropic Gaussian Representations
Machine Learning

[2602.19373] Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

This paper presents a method for enhancing stability in deep reinforcement learning by utilizing isotropic Gaussian representations, addr...

arXiv - AI · 3 min ·
[2602.18492] Vibe Coding on Trial: Operating Characteristics of Unanimous LLM Juries
Llms

[2602.18492] Vibe Coding on Trial: Operating Characteristics of Unanimous LLM Juries

The paper explores the effectiveness of unanimous committees of Large Language Models (LLMs) in evaluating SQL queries, revealing insight...

arXiv - AI · 4 min ·
[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation
Llms

[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

The article examines red teaming as a socio-technical practice in evaluating large language models (LLMs), highlighting the importance of...

arXiv - AI · 4 min ·
Previous Page 65 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime