AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2411.01685] Reducing Biases in Record Matching Through Scores Calibration
Machine Learning

[2411.01685] Reducing Biases in Record Matching Through Scores Calibration

This paper explores methods to reduce biases in record matching through score calibration, proposing two model-agnostic post-processing t...

arXiv - Machine Learning · 4 min ·
[2602.20076] Robust Taylor-Lagrange Control for Safety-Critical Systems
Ai Safety

[2602.20076] Robust Taylor-Lagrange Control for Safety-Critical Systems

The paper presents a robust Taylor-Lagrange Control (rTLC) method for safety-critical systems, addressing the feasibility preservation pr...

arXiv - AI · 3 min ·
[2602.20064] The LLMbda Calculus: AI Agents, Conversations, and Information Flow
Llms

[2602.20064] The LLMbda Calculus: AI Agents, Conversations, and Information Flow

The LLMbda Calculus introduces a formal framework for understanding AI agents' conversations, addressing vulnerabilities like prompt inje...

arXiv - AI · 4 min ·
[2602.20156] Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
Llms

[2602.20156] Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

The paper introduces SkillInject, a benchmark for evaluating the vulnerability of LLM agents to skill file attacks, revealing high suscep...

arXiv - Machine Learning · 4 min ·
[2602.20151] Conformal Risk Control for Non-Monotonic Losses
Nlp

[2602.20151] Conformal Risk Control for Non-Monotonic Losses

This article presents a novel approach to conformal risk control for non-monotonic losses, extending traditional methods to multidimensio...

arXiv - Machine Learning · 3 min ·
[2602.19983] Contextual Safety Reasoning and Grounding for Open-World Robots
Robotics

[2602.19983] Contextual Safety Reasoning and Grounding for Open-World Robots

The paper presents CORE, a novel safety framework for open-world robots that enables contextual reasoning and enforcement of safety rules...

arXiv - AI · 4 min ·
[2602.19948] Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming
Llms

[2602.19948] Assessing Risks of Large Language Models in Mental Health Support: A Framework for Automated Clinical AI Red Teaming

This article presents a framework for assessing the risks associated with using large language models (LLMs) in mental health support, hi...

arXiv - AI · 4 min ·
[2602.20068] The Invisible Gorilla Effect in Out-of-distribution Detection
Machine Learning

[2602.20068] The Invisible Gorilla Effect in Out-of-distribution Detection

The paper explores the 'Invisible Gorilla Effect' in out-of-distribution (OOD) detection, revealing that detection performance varies bas...

arXiv - Machine Learning · 4 min ·
[2602.20046] Closing the gap in multimodal medical representation alignment
Nlp

[2602.20046] Closing the gap in multimodal medical representation alignment

This paper addresses the modality gap in multimodal medical representation alignment, proposing a framework to enhance alignment between ...

arXiv - Machine Learning · 3 min ·
[2602.19872] GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery
Ai Safety

[2602.19872] GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery

The paper presents GOAL, a framework for Continual Generalized Category Discovery (C-GCD) that enhances class discovery while minimizing ...

arXiv - AI · 3 min ·
[2602.20001] FairFS: Addressing Deep Feature Selection Biases for Recommender System
Machine Learning

[2602.20001] FairFS: Addressing Deep Feature Selection Biases for Recommender System

The paper presents FairFS, a novel algorithm designed to address biases in feature selection for recommender systems, enhancing accuracy ...

arXiv - Machine Learning · 4 min ·
[2602.19844] LLM-enabled Applications Require System-Level Threat Monitoring
Llms

[2602.19844] LLM-enabled Applications Require System-Level Threat Monitoring

The paper discusses the need for system-level threat monitoring in LLM-enabled applications, highlighting security challenges and advocat...

arXiv - AI · 3 min ·
[2602.19843] MAS-FIRE: Fault Injection and Reliability Evaluation for LLM-Based Multi-Agent Systems
Llms

[2602.19843] MAS-FIRE: Fault Injection and Reliability Evaluation for LLM-Based Multi-Agent Systems

The paper presents MAS-FIRE, a framework for evaluating the reliability of LLM-based Multi-Agent Systems through fault injection, address...

arXiv - AI · 4 min ·
[2602.19818] SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models
Open Source Ai

[2602.19818] SafePickle: Robust and Generic ML Detection of Malicious Pickle-based ML Models

The paper presents SafePickle, a machine-learning-based scanner designed to detect malicious Pickle-based ML models, achieving a high F1-...

arXiv - AI · 4 min ·
[2602.19918] RobPI: Robust Private Inference against Malicious Client
Machine Learning

[2602.19918] RobPI: Robust Private Inference against Malicious Client

The paper presents RobPI, a robust private inference protocol designed to counteract malicious client attacks, demonstrating significant ...

arXiv - Machine Learning · 4 min ·
[2602.19859] Dirichlet Scale Mixture Priors for Bayesian Neural Networks
Machine Learning

[2602.19859] Dirichlet Scale Mixture Priors for Bayesian Neural Networks

This article introduces Dirichlet Scale Mixture (DSM) priors for Bayesian Neural Networks, addressing limitations in interpretability and...

arXiv - Machine Learning · 4 min ·
[2602.19718] Carbon-Aware Governance Gates: An Architecture for Sustainable GenAI Development
Generative Ai

[2602.19718] Carbon-Aware Governance Gates: An Architecture for Sustainable GenAI Development

The paper proposes Carbon-Aware Governance Gates (CAGG) to integrate sustainability into Generative AI development, addressing the increa...

arXiv - AI · 3 min ·
[2602.19668] Personalized Longitudinal Medical Report Generation via Temporally-Aware Federated Adaptation
Machine Learning

[2602.19668] Personalized Longitudinal Medical Report Generation via Temporally-Aware Federated Adaptation

This article presents a novel framework, FedTAR, for generating personalized longitudinal medical reports using federated learning that a...

arXiv - Machine Learning · 3 min ·
[2602.19614] Workflow-Level Design Principles for Trustworthy GenAI in Automotive System Engineering
Llms

[2602.19614] Workflow-Level Design Principles for Trustworthy GenAI in Automotive System Engineering

This article presents workflow-level design principles for integrating trustworthy Generative AI in automotive system engineering, addres...

arXiv - Machine Learning · 3 min ·
[2602.19629] Cooperation After the Algorithm: Designing Human-AI Coexistence Beyond the Illusion of Collaboration
Ai Safety

[2602.19629] Cooperation After the Algorithm: Designing Human-AI Coexistence Beyond the Illusion of Collaboration

The paper discusses the design of human-AI coexistence, emphasizing the need for governance frameworks to ensure responsible collaboratio...

arXiv - AI · 4 min ·
Previous Page 62 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime