[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
Abstract page for arXiv paper 2603.12681: Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
Alignment, bias, regulation, and responsible AI
Abstract page for arXiv paper 2603.12681: Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
Abstract page for arXiv paper 2512.02711: CREST: Universal Safety Guardrails Through Cluster-Guided Cross-Lingual Transfer
Abstract page for arXiv paper 2510.03721: Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
This paper explores procedural fairness in machine learning, proposing a new metric for evaluation and methods to enhance fairness withou...
The paper presents FairQuant, a framework for fairness-aware mixed-precision quantization in medical image classification, optimizing bot...
The paper introduces Natural Language Declarative Prompting (NLD-P), a governance method for prompt design that addresses challenges pose...
The paper introduces TherapyProbe, a methodology for enhancing relational safety in mental health chatbots through adversarial simulation...
The paper presents Q-Tag, a novel watermarking framework for quantum circuit generative models (QCGMs), addressing the need for secure co...
This article introduces a novel LLM agent designed to assess and mitigate deanonymization risks in textual data using a method called SAL...
The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through ali...
AgentSentry introduces a novel framework to mitigate indirect prompt injection (IPI) in LLM agents, enhancing their security while mainta...
This study explores how modality affects preference alignment in AI systems, comparing human and synthetic evaluations of audio and text ...
The paper presents IMMACULATE, a framework for auditing large language models (LLMs) using verifiable computation to detect economic devi...
The paper presents PSQE, a method for enhancing pseudo seed quality in unsupervised multimodal entity alignment, addressing challenges in...
The paper presents CGSA, a novel framework for Source-Free Domain Adaptive Object Detection that integrates object-centric learning to en...
DPSQL+ is a new SQL library designed to enhance data privacy by enforcing differential privacy and a minimum frequency rule, ensuring sen...
The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new...
TorchLean is a framework that formalizes neural networks within the Lean 4 theorem prover, enabling precise semantics for execution and v...
This study explores how a personalized large language model (LLM) can correct climate action misperceptions among climate-concerned indiv...
EvolveGen introduces a novel framework for generating hardware model checking benchmarks using reinforcement learning, addressing the ben...
This article evaluates transfer learning models for IoT DDoS detection, focusing on explainability and resource constraints. It analyzes ...
This article explores the relationship between AI and humans through the lens of large language models (LLMs), focusing on the Sydney per...
The paper discusses the security risks posed by implicit prompt injection in large language model (LLM) agents, demonstrating how adversa...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime