AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

All Content

Machine Learning

[2411.01685] Reducing Biases in Record Matching Through Scores Calibration

This paper explores methods to reduce biases in record matching through score calibration, proposing two model-agnostic post-processing t...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2602.20076] Robust Taylor-Lagrange Control for Safety-Critical Systems

The paper presents a robust Taylor-Lagrange Control (rTLC) method for safety-critical systems, addressing the feasibility preservation pr...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20064] The LLMbda Calculus: AI Agents, Conversations, and Information Flow

The LLMbda Calculus introduces a formal framework for understanding AI agents' conversations, addressing vulnerabilities like prompt inje...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20156] Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

The paper introduces SkillInject, a benchmark for evaluating the vulnerability of LLM agents to skill file attacks, revealing high suscep...

arXiv - Machine Learning · 4 min · about 1 month ago

Nlp

[2602.20151] Conformal Risk Control for Non-Monotonic Losses

This article presents a novel approach to conformal risk control for non-monotonic losses, extending traditional methods to multidimensio...

arXiv - Machine Learning · 3 min · about 1 month ago

Robotics

[2602.19983] Contextual Safety Reasoning and Grounding for Open-World Robots

The paper presents CORE, a novel safety framework for open-world robots that enables contextual reasoning and enforcement of safety rules...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19948] Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

This article presents a framework for assessing the risks associated with using large language models (LLMs) in mental health support, hi...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20068] The Invisible Gorilla Effect in Out-of-distribution Detection

The paper explores the 'Invisible Gorilla Effect' in out-of-distribution (OOD) detection, revealing that detection performance varies bas...

arXiv - Machine Learning · 4 min · about 1 month ago

Nlp

[2602.20046] Closing the gap in multimodal medical representation alignment

This paper addresses the modality gap in multimodal medical representation alignment, proposing a framework to enhance alignment between ...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2602.19872] GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery

The paper presents GOAL, a framework for Continual Generalized Category Discovery (C-GCD) that enhances class discovery while minimizing ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20001] FairFS: Addressing Deep Feature Selection Biases for Recommender System

The paper presents FairFS, a novel algorithm designed to address biases in feature selection for recommender systems, enhancing accuracy ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.19844] LLM-enabled Applications Require System-Level Threat Monitoring

The paper discusses the need for system-level threat monitoring in LLM-enabled applications, highlighting security challenges and advocat...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.19843] MAS-FIRE: Fault Injection and Reliability Evaluation for LLM-Based Multi-Agent Systems

The paper presents MAS-FIRE, a framework for evaluating the reliability of LLM-based Multi-Agent Systems through fault injection, address...

arXiv - AI · 4 min · about 1 month ago

Open Source Ai

[2602.19818] SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

The paper presents SafePickle, a machine-learning-based scanner designed to detect malicious Pickle-based ML models, achieving a high F1-...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.19918] RobPI: Robust Private Inference against Malicious Client

The paper presents RobPI, a robust private inference protocol designed to counteract malicious client attacks, demonstrating significant ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19859] Dirichlet Scale Mixture Priors for Bayesian Neural Networks

This article introduces Dirichlet Scale Mixture (DSM) priors for Bayesian Neural Networks, addressing limitations in interpretability and...

arXiv - Machine Learning · 4 min · about 1 month ago

Generative Ai

[2602.19718] Carbon-Aware Governance Gates: An Architecture for Sustainable GenAI Development

The paper proposes Carbon-Aware Governance Gates (CAGG) to integrate sustainability into Generative AI development, addressing the increa...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.19668] Personalized Longitudinal Medical Report Generation via Temporally-Aware Federated Adaptation

This article presents a novel framework, FedTAR, for generating personalized longitudinal medical reports using federated learning that a...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.19614] Workflow-Level Design Principles for Trustworthy GenAI in Automotive System Engineering

This article presents workflow-level design principles for integrating trustworthy Generative AI in automotive system engineering, addres...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2602.19629] Cooperation After the Algorithm: Designing Human-AI Coexistence Beyond the Illusion of Collaboration

The paper discusses the design of human-AI coexistence, emphasizing the need for governance frameworks to ensure responsible collaboratio...

arXiv - AI · 4 min · about 1 month ago

Previous Page 62 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

[2411.01685] Reducing Biases in Record Matching Through Scores Calibration

[2602.20076] Robust Taylor-Lagrange Control for Safety-Critical Systems

[2602.20064] The LLMbda Calculus: AI Agents, Conversations, and Information Flow

[2602.20156] Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

[2602.20151] Conformal Risk Control for Non-Monotonic Losses

[2602.19983] Contextual Safety Reasoning and Grounding for Open-World Robots

[2602.19948] Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

[2602.20068] The Invisible Gorilla Effect in Out-of-distribution Detection

[2602.20046] Closing the gap in multimodal medical representation alignment

[2602.19872] GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery

[2602.20001] FairFS: Addressing Deep Feature Selection Biases for Recommender System

[2602.19844] LLM-enabled Applications Require System-Level Threat Monitoring

[2602.19843] MAS-FIRE: Fault Injection and Reliability Evaluation for LLM-Based Multi-Agent Systems

[2602.19818] SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

[2602.19918] RobPI: Robust Private Inference against Malicious Client

[2602.19859] Dirichlet Scale Mixture Priors for Bayesian Neural Networks

[2602.19718] Carbon-Aware Governance Gates: An Architecture for Sustainable GenAI Development

[2602.19668] Personalized Longitudinal Medical Report Generation via Temporally-Aware Federated Adaptation

[2602.19614] Workflow-Level Design Principles for Trustworthy GenAI in Automotive System Engineering

[2602.19629] Cooperation After the Algorithm: Designing Human-AI Coexistence Beyond the Illusion of Collaboration

Related Topics

Stay updated with AI News