AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·
[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology
Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min ·
[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min ·

All Content

[2602.20676] PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization
Llms

[2602.20676] PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization

The paper presents PRECTR-V2, an advanced framework for improving search relevance and click-through rate (CTR) prediction by addressing ...

arXiv - AI · 4 min ·
[2602.20670] CAMEL: Confidence-Gated Reflection for Reward Modeling
Llms

[2602.20670] CAMEL: Confidence-Gated Reflection for Reward Modeling

The paper introduces CAMEL, a confidence-gated reflection framework for reward modeling in AI, achieving state-of-the-art performance wit...

arXiv - AI · 3 min ·
[2602.20658] Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video
Llms

[2602.20658] Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video

This article explores the use of vision-language models (VLMs) for non-invasive ergonomic assessment of manual lifting tasks, estimating ...

arXiv - Machine Learning · 4 min ·
[2602.20634] Enhancing Hate Speech Detection on Social Media: A Comparative Analysis of Machine Learning Models and Text Transformation Approaches
Machine Learning

[2602.20634] Enhancing Hate Speech Detection on Social Media: A Comparative Analysis of Machine Learning Models and Text Transformation Approaches

This article evaluates various machine learning models for hate speech detection on social media, comparing traditional and advanced tech...

arXiv - AI · 3 min ·
[2602.20595] OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services
Llms

[2602.20595] OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

The paper presents OptiLeak, a framework utilizing reinforcement learning to enhance prompt reconstruction efficiency in multi-tenant LLM...

arXiv - AI · 4 min ·
[2602.20580] Personal Information Parroting in Language Models
Llms

[2602.20580] Personal Information Parroting in Language Models

This article examines the issue of personal information memorization in language models, highlighting the risks and proposing a detection...

arXiv - Machine Learning · 3 min ·
[2602.20541] Maximin Share Guarantees via Limited Cost-Sensitive Sharing
Ai Safety

[2602.20541] Maximin Share Guarantees via Limited Cost-Sensitive Sharing

This paper explores fair allocation of indivisible goods through limited cost-sensitive sharing, demonstrating how controlled sharing can...

arXiv - AI · 4 min ·
[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions
Llms

[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions

This article explores a hybrid dialogue system that integrates Large Language Models (LLMs) within a rule-based framework to enhance lear...

arXiv - AI · 3 min ·
[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas
Llms

[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas

This article explores the limitations of diversity in ideas generated by large language models (LLMs) compared to human creativity, ident...

arXiv - AI · 4 min ·
[2602.20400] Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation
Llms

[2602.20400] Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation

This article discusses three significant challenges and two potential solutions for improving the safety of unsupervised elicitation in l...

arXiv - Machine Learning · 4 min ·
[2602.20332] No One Size Fits All: QueryBandits for Hallucination Mitigation
Llms

[2602.20332] No One Size Fits All: QueryBandits for Hallucination Mitigation

The paper introduces QueryBandits, a model-agnostic framework designed to mitigate hallucinations in large language models (LLMs) by opti...

arXiv - Machine Learning · 4 min ·
[2602.20330] Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking
Llms

[2602.20330] Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

This article presents a framework for circuit tracing in vision-language models (VLMs), aiming to enhance understanding of their internal...

arXiv - Machine Learning · 3 min ·
[2602.20300] What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance
Llms

[2602.20300] What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

This article examines how specific linguistic features of queries impact the performance of Large Language Models (LLMs), particularly in...

arXiv - AI · 3 min ·
[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems
Ai Infrastructure

[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems

This article examines the expectation-realisation gap in agentic AI systems, revealing discrepancies between anticipated productivity gai...

arXiv - AI · 3 min ·
[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution
Ai Safety

[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

This paper proposes the 'Right to History,' a principle ensuring individuals have a verifiable record of AI agent actions on personal har...

arXiv - AI · 3 min ·
[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
Llms

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, ...

arXiv - AI · 3 min ·
[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts
Llms

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

This paper explores the concept of 'Epistemic Debt' in novice programming using generative AI, proposing metacognitive scripts to enhance...

arXiv - AI · 4 min ·
[2602.20207] Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis
Llms

[2602.20207] Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis

This article discusses the concept of 'golden layers' in large language models (LLMs) and presents a novel method, Layer Gradient Analysi...

arXiv - AI · 4 min ·
[2602.20202] Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
Llms

[2602.20202] Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

This paper evaluates the reliability of digital forensic evidence identified by large language models (LLMs), proposing a structured fram...

arXiv - AI · 4 min ·
[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access
Ai Safety

[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access

The OpenPort Protocol introduces a governance-first approach for AI agents, ensuring secure access to application tools while addressing ...

arXiv - AI · 4 min ·
Previous Page 55 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime