AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2512.21106] Semantic Refinement with LLMs for Graph Representations
Llms

[2512.21106] Semantic Refinement with LLMs for Graph Representations

Abstract page for arXiv paper 2512.21106: Semantic Refinement with LLMs for Graph Representations

arXiv - Machine Learning · 4 min ·
[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology
Machine Learning

[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology

Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology

arXiv - Machine Learning · 4 min ·
[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models
Llms

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

Abstract page for arXiv paper 2511.18123: Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-La...

arXiv - Machine Learning · 4 min ·

All Content

[2602.20468] CGSTA: Cross-Scale Graph Contrast with Stability-Aware Alignment for Multivariate Time-Series Anomaly Detection
Ai Safety

[2602.20468] CGSTA: Cross-Scale Graph Contrast with Stability-Aware Alignment for Multivariate Time-Series Anomaly Detection

The CGSTA framework enhances multivariate time-series anomaly detection by utilizing dynamic layered graphs and stability-aware alignment...

arXiv - Machine Learning · 4 min ·
[2602.20457] Oracle-Robust Online Alignment for Large Language Models
Llms

[2602.20457] Oracle-Robust Online Alignment for Large Language Models

This paper explores the online alignment of large language models (LLMs) under misspecified preference feedback, proposing a robust optim...

arXiv - Machine Learning · 3 min ·
[2602.20419] CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks
Machine Learning

[2602.20419] CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks

The paper introduces CREDIT, a method for certified ownership verification of deep neural networks to combat model extraction attacks, en...

arXiv - Machine Learning · 3 min ·
[2602.20418] CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense
Machine Learning

[2602.20418] CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense

The paper presents CITED, a novel framework for defending Graph Neural Networks (GNNs) against Model Extraction Attacks (MEAs) by providi...

arXiv - Machine Learning · 4 min ·
[2602.20273] The Truthfulness Spectrum Hypothesis
Llms

[2602.20273] The Truthfulness Spectrum Hypothesis

The Truthfulness Spectrum Hypothesis explores how large language models (LLMs) represent truthfulness across various domains, revealing a...

arXiv - Machine Learning · 4 min ·
[2602.20194] FedAvg-Based CTMC Hazard Model for Federated Bridge Deterioration Assessment
Machine Learning

[2602.20194] FedAvg-Based CTMC Hazard Model for Federated Bridge Deterioration Assessment

This article presents a federated framework using a CTMC hazard model for assessing bridge deterioration, allowing municipalities to coll...

arXiv - Machine Learning · 4 min ·
[2602.11184] KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
Llms

[2602.11184] KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models

The paper presents KBVQ-MoE, a novel framework for improving vector quantization in Mixture of Experts (MoE) large language models, addre...

arXiv - Machine Learning · 4 min ·
[2602.09050] SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy
Ai Safety

[2602.09050] SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy

The paper introduces SAS-Net, a novel framework for robust spatiotemporal registration in bidirectional photoacoustic microscopy, address...

arXiv - AI · 4 min ·
[2602.00044] When LLMs Imagine People: A Human-Centered Persona Brainstorm Audit for Bias and Fairness in Creative Applications
Llms

[2602.00044] When LLMs Imagine People: A Human-Centered Persona Brainstorm Audit for Bias and Fairness in Creative Applications

This paper introduces the Persona Brainstorm Audit (PBA), a method for assessing bias in Large Language Models (LLMs) used in creative ap...

arXiv - AI · 4 min ·
[2601.03868] What Matters For Safety Alignment?
Llms

[2601.03868] What Matters For Safety Alignment?

This paper investigates safety alignment in large language models (LLMs) and large reasoning models (LRMs), identifying key factors that ...

arXiv - AI · 4 min ·
[2512.24787] HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment
Machine Learning

[2512.24787] HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment

The paper presents HiGR, a novel framework for generative slate recommendation that enhances efficiency and user preference alignment thr...

arXiv - AI · 4 min ·
[2512.16602] Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
Llms

[2512.16602] Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics

The paper introduces Refusal Steering, a method for controlling Large Language Models' refusal behavior on sensitive topics without retra...

arXiv - AI · 4 min ·
[2510.22620] Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Llms

[2510.22620] Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents

This article evaluates the security of large language models (LLMs) used in AI agents, introducing a framework for identifying vulnerabil...

arXiv - Machine Learning · 4 min ·
[2510.22500] Towards Scalable Oversight via Partitioned Human Supervision
Machine Learning

[2510.22500] Towards Scalable Oversight via Partitioned Human Supervision

The paper proposes a scalable oversight framework for AI systems using partitioned human supervision, addressing challenges in obtaining ...

arXiv - Machine Learning · 4 min ·
[2510.08091] Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility
Llms

[2510.08091] Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility

This article explores how rationales generated by large language models (LLMs) influence human judgments of plausibility in commonsense r...

arXiv - AI · 3 min ·
[2510.06868] Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Recovery
Machine Learning

[2510.06868] Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Recovery

This paper presents a novel approach to image transmission using multi-hop deep joint source-channel coding (DeepJSCC) combined with deep...

arXiv - Machine Learning · 3 min ·
[2510.00037] On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Machine Learning

[2510.00037] On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations

This paper evaluates the robustness of Vision-Language-Action (VLA) models against various multi-modal perturbations, proposing a new met...

arXiv - AI · 4 min ·
[2509.25774] PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models
Machine Learning

[2509.25774] PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

The paper introduces Proportionate Credit Policy Optimization (PCPO), a novel framework aimed at improving the stability and quality of t...

arXiv - Machine Learning · 3 min ·
[2506.03922] HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Llms

[2506.03922] HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

HSSBench introduces a benchmark for evaluating Multimodal Large Language Models (MLLMs) in Humanities and Social Sciences, addressing gap...

arXiv - AI · 4 min ·
[2504.18310] How much does context affect the accuracy of AI health advice?
Llms

[2504.18310] How much does context affect the accuracy of AI health advice?

This article examines how linguistic and contextual factors influence the accuracy of AI-generated health advice, revealing significant d...

arXiv - Machine Learning · 4 min ·
Previous Page 53 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime