AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · 22 minutes ago

Ai Safety

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

Abstract page for arXiv paper 2511.16417: Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling...

arXiv - AI · 4 min · about 1 hour ago

Llms

[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

Abstract page for arXiv paper 2510.08847: What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

arXiv - AI · 4 min · about 1 hour ago

All Content

Machine Learning

[2410.12439] Beyond Attribution: Unified Concept-Level Explanations

The paper presents UnCLE, a framework that enhances model-agnostic explanation techniques by integrating concept-based approaches, offeri...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22935] A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment

This paper presents a robust framework for Bangla Automatic Speech Recognition (ASR) and Speaker Diarization, addressing challenges in pr...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2410.10922] Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure

This paper introduces a novel method for label unlearning in Vertical Federated Learning (VFL), addressing privacy concerns while maintai...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2404.01877] Procedural Fairness in Machine Learning

This paper explores procedural fairness in machine learning, proposing a new metric for evaluation and methods to enhance fairness withou...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.23192] FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

The paper presents FairQuant, a framework for fairness-aware mixed-precision quantization in medical image classification, optimizing bot...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.22790] Natural Language Declarative Prompting (NLD-P): A Modular Governance Method for Prompt Design Under Model Drift

The paper introduces Natural Language Declarative Prompting (NLD-P), a governance method for prompt design that addresses challenges pose...

arXiv - AI · 4 min · about 1 month ago

Ai Startups

[2602.22775] TherapyProbe: Generating Design Knowledge for Relational Safety in Mental Health Chatbots Through Adversarial Simulation

The paper introduces TherapyProbe, a methodology for enhancing relational safety in mental health chatbots through adversarial simulation...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.23085] Q-Tag: Watermarking Quantum Circuit Generative Models

The paper presents Q-Tag, a novel watermarking framework for quantum circuit generative models (QCGMs), addressing the need for secure co...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.23079] Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent

This article introduces a novel LLM agent designed to assess and mitigate deanonymization risks in textual data using a method called SAL...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through ali...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22724] AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

AgentSentry introduces a novel framework to mitigate indirect prompt injection (IPI) in LLM agents, enhancing their security while mainta...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment

This study explores how modality affects preference alignment in AI systems, comparing human and synthetic evaluations of audio and text ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22700] IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

The paper presents IMMACULATE, a framework for auditing large language models (LLMs) using verifiable computation to detect economic devi...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA

The paper presents PSQE, a method for enhancing pseudo seed quality in unsupervised multimodal entity alignment, addressing challenges in...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.22621] CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

The paper presents CGSA, a novel framework for Source-Free Domain Adaptive Object Detection that integrates object-centric learning to en...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22699] DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

DPSQL+ is a new SQL library designed to enhance data privacy by enforcing differential privacy and a minimum frequency rule, ensuring sen...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22631] TorchLean: Formalizing Neural Networks in Lean

TorchLean is a framework that formalizes neural networks within the Lean 4 theorem prover, enabling precise semantics for execution and v...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22564] Addressing Climate Action Misperceptions with Generative AI

This study explores how a personalized large language model (LLM) can correct climate action misperceptions among climate-concerned indiv...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22609] EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

EvolveGen introduces a novel framework for generating hardware model checking benchmarks using reinforcement learning, addressing the ben...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 36 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

All Content

[2410.12439] Beyond Attribution: Unified Concept-Level Explanations

[2602.22935] A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment

[2410.10922] Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure

[2404.01877] Procedural Fairness in Machine Learning

[2602.23192] FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

[2602.22790] Natural Language Declarative Prompting (NLD-P): A Modular Governance Method for Prompt Design Under Model Drift

[2602.22775] TherapyProbe: Generating Design Knowledge for Relational Safety in Mental Health Chatbots Through Adversarial Simulation

[2602.23085] Q-Tag: Watermarking Quantum Circuit Generative Models

[2602.23079] Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent

[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

[2602.22724] AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment

[2602.22700] IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation

[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA

[2602.22621] CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

[2602.22699] DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

[2602.22631] TorchLean: Formalizing Neural Networks in Lean

[2602.22564] Addressing Climate Action Misperceptions with Generative AI

[2602.22609] EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

Related Topics

Stay updated with AI News