AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·
[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology
Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min ·
[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min ·

All Content

[2601.17064] Between Search and Platform: ChatGPT Under the DSA
Llms

[2601.17064] Between Search and Platform: ChatGPT Under the DSA

This article analyzes the classification of ChatGPT under the Digital Services Act (DSA), proposing it as a hybrid of search engine and p...

arXiv - AI · 3 min ·
[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective
Ai Safety

[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

This article explores the gaps in understanding superintelligence misalignment, emphasizing the absence of the human subject and the impl...

arXiv - AI · 4 min ·
[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
Llms

[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation

The paper presents EARL, an Entropy-Aware Reinforcement Learning framework designed to enhance the reliability of RTL code generation by ...

arXiv - Machine Learning · 4 min ·
[2509.25184] Incentive-Aligned Multi-Source LLM Summaries
Llms

[2509.25184] Incentive-Aligned Multi-Source LLM Summaries

The paper presents an innovative framework called Truthful Text Summarization (TTS) aimed at enhancing the factual accuracy of multi-sour...

arXiv - AI · 3 min ·
[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants
Machine Learning

[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

This paper argues for a shift in machine learning fairness research to focus on structural injustice through social determinants, rather ...

arXiv - Machine Learning · 4 min ·
[2507.08017] Mechanistic Indicators of Understanding in Large Language Models
Llms

[2507.08017] Mechanistic Indicators of Understanding in Large Language Models

This paper explores mechanistic indicators of understanding in large language models (LLMs), proposing a tiered framework to assess their...

arXiv - AI · 4 min ·
[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series
Machine Learning

[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series

This article presents a comprehensive benchmark for electrocardiogram (ECG) time-series analysis, highlighting its unique characteristics...

arXiv - Machine Learning · 4 min ·
[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack
Machine Learning

[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack

This paper introduces a novel attack and auditing framework for Vertical Federated Learning (VFL), addressing vulnerabilities in inferenc...

arXiv - AI · 4 min ·
[2506.09886] Probabilistic distances-based hallucination detection in LLMs with RAG
Llms

[2506.09886] Probabilistic distances-based hallucination detection in LLMs with RAG

This paper presents a novel method for detecting hallucinations in large language models (LLMs) using probabilistic distances in retrieva...

arXiv - AI · 3 min ·
[2506.07452] When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
Llms

[2506.07452] When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

This paper explores the vulnerabilities of large language models (LLMs) to superficial style alignment, proposing a defense mechanism cal...

arXiv - Machine Learning · 4 min ·
[2506.06060] Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models
Llms

[2506.06060] Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models

This article discusses the privacy risks associated with federated fine-tuning of large language models, highlighting methods for extract...

arXiv - AI · 4 min ·
[2504.06533] Rethinking Flexible Graph Similarity Computation: One-step Alignment with Global Guidance
Machine Learning

[2504.06533] Rethinking Flexible Graph Similarity Computation: One-step Alignment with Global Guidance

The paper presents a novel approach to graph similarity computation through the Graph Edit Network (GEN), which integrates cost-aware est...

arXiv - Machine Learning · 4 min ·
[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models
Llms

[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models

This article evaluates the quality of hallucination benchmarks for Large Vision-Language Models (LVLMs) and introduces a new framework fo...

arXiv - AI · 4 min ·
[2601.10402] Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
Llms

[2601.10402] Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

The paper discusses advancements in AI towards ultra-long-horizon autonomy, introducing ML-Master 2.0, which utilizes Hierarchical Cognit...

arXiv - AI · 4 min ·
[2510.19139] A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist
Llms

[2510.19139] A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist

This paper evaluates the cognitive abilities of large language models (LLMs) in assessing clinical trial reporting according to CONSORT s...

arXiv - AI · 4 min ·
[2508.07667] 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning
Llms

[2508.07667] 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning

The paper presents a multi-agent framework to enhance contextual privacy in large language models (LLMs), demonstrating a significant red...

arXiv - AI · 3 min ·
[2506.10947] Spurious Rewards: Rethinking Training Signals in RLVR
Llms

[2506.10947] Spurious Rewards: Rethinking Training Signals in RLVR

The paper explores the impact of spurious rewards in reinforcement learning with verifiable rewards (RLVR), demonstrating how they can en...

arXiv - Machine Learning · 4 min ·
[2505.13529] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Machine Learning

[2505.13529] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

The paper presents BARREL, a framework designed to enhance the factual reliability of Large Reasoning Models (LRMs) by addressing overcon...

arXiv - Machine Learning · 3 min ·
[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes
Machine Learning

[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

This paper demonstrates that off-the-shelf image-to-image models can effectively defeat various image protection schemes, highlighting a ...

arXiv - AI · 4 min ·
[2602.22149] Enhancing Framingham Cardiovascular Risk Score Transparency through Logic-Based XAI
Ai Safety

[2602.22149] Enhancing Framingham Cardiovascular Risk Score Transparency through Logic-Based XAI

This article presents a logic-based explainable AI model designed to enhance the transparency of the Framingham Cardiovascular Risk Score...

arXiv - AI · 4 min ·
Previous Page 48 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime