AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

[2305.08175] ResidualPlanner+: a scalable matrix mechanism for marginals and beyond
Ai Safety

[2305.08175] ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

Abstract page for arXiv paper 2305.08175: ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

arXiv - Machine Learning · 4 min ·
[2604.02610] Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport
Nlp

[2604.02610] Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport

Abstract page for arXiv paper 2604.02610: Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport

arXiv - Machine Learning · 3 min ·
[2604.02574] Understanding the Effects of Safety Unalignment on Large Language Models
Llms

[2604.02574] Understanding the Effects of Safety Unalignment on Large Language Models

Abstract page for arXiv paper 2604.02574: Understanding the Effects of Safety Unalignment on Large Language Models

arXiv - Machine Learning · 4 min ·

All Content

[2602.17174] Continual uncertainty learning
Machine Learning

[2602.17174] Continual uncertainty learning

The paper presents a novel framework for continual uncertainty learning in robust control of nonlinear dynamical systems, addressing chal...

arXiv - AI · 4 min ·
[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting
Computer Vision

[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting

This paper presents a novel approach to 3D scene rendering using multimodal Gaussian splatting, integrating RF sensing for improved accur...

arXiv - AI · 4 min ·
[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Llms

[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

The paper presents FLoRG, a federated fine-tuning framework that utilizes low-rank Gram matrices and Procrustes alignment to enhance the ...

arXiv - AI · 4 min ·
[2602.17070] General sample size analysis for probabilities of causation: a delta method approach
Data Science

[2602.17070] General sample size analysis for probabilities of causation: a delta method approach

This paper presents a delta method approach for sample size analysis in estimating probabilities of causation (PoCs), addressing the need...

arXiv - AI · 3 min ·
[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents
Llms

[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents

The paper presents 'Wink', a system designed to recover coding agents from misbehaviors, enhancing their reliability in software developm...

arXiv - AI · 4 min ·
[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities
Ai Agents

[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities

This article explores the challenges and opportunities in overseeing AI agents without constant human oversight, focusing on user studies...

arXiv - AI · 3 min ·
[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind
Machine Learning

[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

The paper presents HiVAE, a hierarchical variational architecture designed to enhance AI's theory of mind capabilities, enabling better i...

arXiv - AI · 3 min ·
[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap
Machine Learning

[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap

This paper explores how learning under noisy supervision is influenced by a feedback-truth gap, demonstrating its effects across various ...

arXiv - AI · 3 min ·
[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains
Llms

[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains

This paper explores how reference-guided evaluators can enhance LLM alignment in non-verifiable domains, demonstrating significant improv...

arXiv - Machine Learning · 4 min ·
[2602.16800] Large-scale online deanonymization with LLMs
Llms

[2602.16800] Large-scale online deanonymization with LLMs

This article discusses the use of large language models (LLMs) for deanonymizing online users, demonstrating high precision in identifyin...

arXiv - Machine Learning · 4 min ·
[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage
Llms

[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage

LiveClin introduces a novel clinical benchmark for evaluating medical LLMs, addressing issues of data contamination and knowledge obsoles...

arXiv - AI · 4 min ·
[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis
Llms

[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

This study investigates whether adversarial code comments can mislead AI security reviewers during vulnerability detection in code, revea...

arXiv - Machine Learning · 4 min ·
[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality
Llms

[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

This article examines the stability of attention heads in transformer models, revealing insights into their representational robustness a...

arXiv - AI · 4 min ·
[2602.16729] Intent Laundering: AI Safety Datasets Are Not What They Seem
Ai Safety

[2602.16729] Intent Laundering: AI Safety Datasets Are Not What They Seem

The paper evaluates AI safety datasets, revealing they often misrepresent real-world attacks due to an overreliance on triggering cues, l...

arXiv - Machine Learning · 4 min ·
[2602.16723] Is Mamba Reliable for Medical Imaging?
Machine Learning

[2602.16723] Is Mamba Reliable for Medical Imaging?

This paper evaluates the reliability of Mamba, a state-space model, for medical imaging under various attack scenarios, highlighting vuln...

arXiv - AI · 3 min ·
[2602.17594] AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games
Ai Startups

[2602.17594] AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

The paper introduces the AI Gamestore, a platform for evaluating machine general intelligence through human games, highlighting its poten...

arXiv - AI · 4 min ·
[2602.17566] A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN
Machine Learning

[2602.17566] A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN

This article presents a hybrid federated learning model that combines SWIN Transformer and CNN for diagnosing lung diseases, particularly...

arXiv - AI · 4 min ·
[2602.17560] ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
Llms

[2602.17560] ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

The paper presents ODESteer, a novel ODE-based framework for aligning large language models (LLMs) by addressing limitations in existing ...

arXiv - AI · 4 min ·
[2602.17508] Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems
Machine Learning

[2602.17508] Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

This article presents a benchmarking framework for optimizing AI models on ARM Cortex processors, focusing on energy efficiency and perfo...

arXiv - AI · 4 min ·
[2602.17418] A Privacy by Design Framework for Large Language Model-Based Applications for Children
Llms

[2602.17418] A Privacy by Design Framework for Large Language Model-Based Applications for Children

This article proposes a Privacy by Design framework for AI applications targeting children, addressing privacy risks and compliance with ...

arXiv - AI · 4 min ·
Previous Page 81 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime