AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

[2305.08175] ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

Abstract page for arXiv paper 2305.08175: ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

arXiv - Machine Learning · 4 min · 14 minutes ago

Nlp

[2604.02610] Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport

Abstract page for arXiv paper 2604.02610: Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport

arXiv - Machine Learning · 3 min · 14 minutes ago

Llms

[2604.02574] Understanding the Effects of Safety Unalignment on Large Language Models

Abstract page for arXiv paper 2604.02574: Understanding the Effects of Safety Unalignment on Large Language Models

arXiv - Machine Learning · 4 min · 14 minutes ago

All Content

Machine Learning

[2602.17174] Continual uncertainty learning

The paper presents a novel framework for continual uncertainty learning in robust control of nonlinear dynamical systems, addressing chal...

arXiv - AI · 4 min · about 2 months ago

Computer Vision

[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting

This paper presents a novel approach to 3D scene rendering using multimodal Gaussian splatting, integrating RF sensing for improved accur...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

The paper presents FLoRG, a federated fine-tuning framework that utilizes low-rank Gram matrices and Procrustes alignment to enhance the ...

arXiv - AI · 4 min · about 2 months ago

Data Science

[2602.17070] General sample size analysis for probabilities of causation: a delta method approach

This paper presents a delta method approach for sample size analysis in estimating probabilities of causation (PoCs), addressing the need...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents

The paper presents 'Wink', a system designed to recover coding agents from misbehaviors, enhancing their reliability in software developm...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities

This article explores the challenges and opportunities in overseeing AI agents without constant human oversight, focusing on user studies...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

The paper presents HiVAE, a hierarchical variational architecture designed to enhance AI's theory of mind capabilities, enabling better i...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap

This paper explores how learning under noisy supervision is influenced by a feedback-truth gap, demonstrating its effects across various ...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains

This paper explores how reference-guided evaluators can enhance LLM alignment in non-verifiable domains, demonstrating significant improv...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.16800] Large-scale online deanonymization with LLMs

This article discusses the use of large language models (LLMs) for deanonymizing online users, demonstrating high precision in identifyin...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage

LiveClin introduces a novel clinical benchmark for evaluating medical LLMs, addressing issues of data contamination and knowledge obsoles...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

This study investigates whether adversarial code comments can mislead AI security reviewers during vulnerability detection in code, revea...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

This article examines the stability of attention heads in transformer models, revealing insights into their representational robustness a...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2602.16729] Intent Laundering: AI Safety Datasets Are Not What They Seem

The paper evaluates AI safety datasets, revealing they often misrepresent real-world attacks due to an overreliance on triggering cues, l...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.16723] Is Mamba Reliable for Medical Imaging?

This paper evaluates the reliability of Mamba, a state-space model, for medical imaging under various attack scenarios, highlighting vuln...

arXiv - AI · 3 min · about 2 months ago

Ai Startups

[2602.17594] AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

The paper introduces the AI Gamestore, a platform for evaluating machine general intelligence through human games, highlighting its poten...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.17566] A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN

This article presents a hybrid federated learning model that combines SWIN Transformer and CNN for diagnosing lung diseases, particularly...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.17560] ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

The paper presents ODESteer, a novel ODE-based framework for aligning large language models (LLMs) by addressing limitations in existing ...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.17508] Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

This article presents a benchmarking framework for optimizing AI models on ARM Cortex processors, focusing on energy efficiency and perfo...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.17418] A Privacy by Design Framework for Large Language Model-Based Applications for Children

This article proposes a Privacy by Design framework for AI applications targeting children, addressing privacy risks and compliance with ...

arXiv - AI · 4 min · about 2 months ago

Previous Page 81 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2305.08175] ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

[2604.02610] Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport

[2604.02574] Understanding the Effects of Safety Unalignment on Large Language Models

All Content

[2602.17174] Continual uncertainty learning

[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting

[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

[2602.17070] General sample size analysis for probabilities of causation: a delta method approach

[2602.17037] Wink: Recovering from Misbehaviors in Coding Agents

[2602.16844] Overseeing Agents Without Constant Oversight: Challenges and Opportunities

[2602.16826] HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

[2602.16829] Learning under noisy supervision is governed by a feedback-truth gap

[2602.16802] References Improve LLM Alignment in Non-Verifiable Domains

[2602.16800] Large-scale online deanonymization with LLMs

[2602.16747] LiveClin: A Live Clinical Benchmark without Leakage

[2602.16741] Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

[2602.16740] Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

[2602.16729] Intent Laundering: AI Safety Datasets Are Not What They Seem

[2602.16723] Is Mamba Reliable for Medical Imaging?

[2602.17594] AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

[2602.17566] A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN

[2602.17560] ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

[2602.17508] Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

[2602.17418] A Privacy by Design Framework for Large Language Model-Based Applications for Children

Related Topics

Stay updated with AI News