AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 4 hours ago

Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min · about 4 hours ago

Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min · about 4 hours ago

All Content

Llms

[2601.17064] Between Search and Platform: ChatGPT Under the DSA

This article analyzes the classification of ChatGPT under the Digital Services Act (DSA), proposing it as a hybrid of search engine and p...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

This article explores the gaps in understanding superintelligence misalignment, emphasizing the absence of the human subject and the impl...

arXiv - AI · 4 min · about 1 month ago

Llms

[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation

The paper presents EARL, an Entropy-Aware Reinforcement Learning framework designed to enhance the reliability of RTL code generation by ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.25184] Incentive-Aligned Multi-Source LLM Summaries

The paper presents an innovative framework called Truthful Text Summarization (TTS) aimed at enhancing the factual accuracy of multi-sour...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

This paper argues for a shift in machine learning fairness research to focus on structural injustice through social determinants, rather ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2507.08017] Mechanistic Indicators of Understanding in Large Language Models

This paper explores mechanistic indicators of understanding in large language models (LLMs), proposing a tiered framework to assess their...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series

This article presents a comprehensive benchmark for electrocardiogram (ECG) time-series analysis, highlighting its unique characteristics...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack

This paper introduces a novel attack and auditing framework for Vertical Federated Learning (VFL), addressing vulnerabilities in inferenc...

arXiv - AI · 4 min · about 1 month ago

Llms

[2506.09886] Probabilistic distances-based hallucination detection in LLMs with RAG

This paper presents a novel method for detecting hallucinations in large language models (LLMs) using probabilistic distances in retrieva...

arXiv - AI · 3 min · about 1 month ago

Llms

[2506.07452] When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

This paper explores the vulnerabilities of large language models (LLMs) to superficial style alignment, proposing a defense mechanism cal...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2506.06060] Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models

This article discusses the privacy risks associated with federated fine-tuning of large language models, highlighting methods for extract...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2504.06533] Rethinking Flexible Graph Similarity Computation: One-step Alignment with Global Guidance

The paper presents a novel approach to graph similarity computation through the Graph Edit Network (GEN), which integrates cost-aware est...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models

This article evaluates the quality of hallucination benchmarks for Large Vision-Language Models (LVLMs) and introduces a new framework fo...

arXiv - AI · 4 min · about 1 month ago

Llms

[2601.10402] Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

The paper discusses advancements in AI towards ultra-long-horizon autonomy, introducing ML-Master 2.0, which utilizes Hierarchical Cognit...

arXiv - AI · 4 min · about 1 month ago

Llms

[2510.19139] A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist

This paper evaluates the cognitive abilities of large language models (LLMs) in assessing clinical trial reporting according to CONSORT s...

arXiv - AI · 4 min · about 1 month ago

Llms

[2508.07667] 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning

The paper presents a multi-agent framework to enhance contextual privacy in large language models (LLMs), demonstrating a significant red...

arXiv - AI · 3 min · about 1 month ago

Llms

[2506.10947] Spurious Rewards: Rethinking Training Signals in RLVR

The paper explores the impact of spurious rewards in reinforcement learning with verifiable rewards (RLVR), demonstrating how they can en...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2505.13529] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

The paper presents BARREL, a framework designed to enhance the factual reliability of Large Reasoning Models (LRMs) by addressing overcon...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

This paper demonstrates that off-the-shelf image-to-image models can effectively defeat various image protection schemes, highlighting a ...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.22149] Enhancing Framingham Cardiovascular Risk Score Transparency through Logic-Based XAI

This article presents a logic-based explainable AI model designed to enhance the transparency of the Framingham Cardiovascular Risk Score...

arXiv - AI · 4 min · about 1 month ago

Previous Page 48 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

All Content

[2601.17064] Between Search and Platform: ChatGPT Under the DSA

[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation

[2509.25184] Incentive-Aligned Multi-Source LLM Summaries

[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

[2507.08017] Mechanistic Indicators of Understanding in Large Language Models

[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series

[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack

[2506.09886] Probabilistic distances-based hallucination detection in LLMs with RAG

[2506.07452] When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

[2506.06060] Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models

[2504.06533] Rethinking Flexible Graph Similarity Computation: One-step Alignment with Global Guidance

[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models

[2601.10402] Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

[2510.19139] A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist

[2508.07667] 1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning

[2506.10947] Spurious Rewards: Rethinking Training Signals in RLVR

[2505.13529] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

[2602.22149] Enhancing Framingham Cardiovascular Risk Score Transparency through Logic-Based XAI

Related Topics

Stay updated with AI News