AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min · about 14 hours ago

Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min · about 14 hours ago

Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min · about 14 hours ago

All Content

Llms

[2602.20676] PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization

The paper presents PRECTR-V2, an advanced framework for improving search relevance and click-through rate (CTR) prediction by addressing ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20670] CAMEL: Confidence-Gated Reflection for Reward Modeling

The paper introduces CAMEL, a confidence-gated reflection framework for reward modeling in AI, achieving state-of-the-art performance wit...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20658] Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video

This article explores the use of vision-language models (VLMs) for non-invasive ergonomic assessment of manual lifting tasks, estimating ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.20634] Enhancing Hate Speech Detection on Social Media: A Comparative Analysis of Machine Learning Models and Text Transformation Approaches

This article evaluates various machine learning models for hate speech detection on social media, comparing traditional and advanced tech...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20595] OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

The paper presents OptiLeak, a framework utilizing reinforcement learning to enhance prompt reconstruction efficiency in multi-tenant LLM...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20580] Personal Information Parroting in Language Models

This article examines the issue of personal information memorization in language models, highlighting the risks and proposing a detection...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2602.20541] Maximin Share Guarantees via Limited Cost-Sensitive Sharing

This paper explores fair allocation of indivisible goods through limited cost-sensitive sharing, demonstrating how controlled sharing can...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions

This article explores a hybrid dialogue system that integrates Large Language Models (LLMs) within a rule-based framework to enhance lear...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas

This article explores the limitations of diversity in ideas generated by large language models (LLMs) compared to human creativity, ident...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20400] Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation

This article discusses three significant challenges and two potential solutions for improving the safety of unsupervised elicitation in l...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.20332] No One Size Fits All: QueryBandits for Hallucination Mitigation

The paper introduces QueryBandits, a model-agnostic framework designed to mitigate hallucinations in large language models (LLMs) by opti...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.20330] Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

This article presents a framework for circuit tracing in vision-language models (VLMs), aiming to enhance understanding of their internal...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.20300] What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

This article examines how specific linguistic features of queries impact the performance of Large Language Models (LLMs), particularly in...

arXiv - AI · 3 min · about 1 month ago

Ai Infrastructure

[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems

This article examines the expectation-realisation gap in agentic AI systems, revealing discrepancies between anticipated productivity gai...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

This paper proposes the 'Right to History,' a principle ensuring individuals have a verifiable record of AI agent actions on personal har...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

This paper explores the concept of 'Epistemic Debt' in novice programming using generative AI, proposing metacognitive scripts to enhance...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20207] Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis

This article discusses the concept of 'golden layers' in large language models (LLMs) and presents a novel method, Layer Gradient Analysi...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20202] Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

This paper evaluates the reliability of digital forensic evidence identified by large language models (LLMs), proposing a structured fram...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access

The OpenPort Protocol introduces a governance-first approach for AI agents, ensuring secure access to application tools while addressing ...

arXiv - AI · 4 min · about 1 month ago

Previous Page 55 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

All Content

[2602.20676] PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization

[2602.20670] CAMEL: Confidence-Gated Reflection for Reward Modeling

[2602.20658] Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video

[2602.20634] Enhancing Hate Speech Detection on Social Media: A Comparative Analysis of Machine Learning Models and Text Transformation Approaches

[2602.20595] OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services

[2602.20580] Personal Information Parroting in Language Models

[2602.20541] Maximin Share Guarantees via Limited Cost-Sensitive Sharing

[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions

[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas

[2602.20400] Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation

[2602.20332] No One Size Fits All: QueryBandits for Hallucination Mitigation

[2602.20330] Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

[2602.20300] What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems

[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

[2602.20207] Golden Layers and Where to Find Them: Improved Knowledge Editing for Large Language Models Via Layer Gradient Analysis

[2602.20202] Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study

[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access

Related Topics

Stay updated with AI News