AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Ai Safety

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

Abstract page for arXiv paper 2511.16417: Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling...

arXiv - AI · 4 min · about 5 hours ago

Llms

[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

Abstract page for arXiv paper 2510.08847: What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

arXiv - AI · 4 min · about 5 hours ago

All Content

Llms

[2602.23111] PRAC: Principal-Random Subspace for LLM Activation Compression and Memory-Efficient Training

The paper presents PRAC, a novel method for compressing activations in large language models, achieving significant memory savings while ...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.23329] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

This article examines the effectiveness of large language models (LLMs) in enhancing novice users' performance on complex biological task...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.23315] Invariant Transformation and Resampling based Epistemic-Uncertainty Reduction

This article presents a novel approach to reducing epistemic uncertainty in AI models through invariant transformation and resampling tec...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22988] Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability

This paper introduces Residual Koopman Spectral Profiling (RKSP) as a method to predict and prevent training instability in transformers,...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2602.23271] Evaluating Stochasticity in Deep Research Agents

This paper evaluates the stochasticity in Deep Research Agents (DRAs), highlighting how variability in their outputs can impact research ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.23258] AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

AgentDropoutV2 introduces a novel pruning framework to enhance information flow in Multi-Agent Systems by dynamically correcting errors d...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.23248] Mitigating Legibility Tax with Decoupled Prover-Verifier Games

This paper presents a novel approach to mitigate the 'legibility tax' in large language models by decoupling the prover-verifier game, al...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.23239] Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

This paper explores the limitations of optimization-based AI systems, arguing that they cannot be norm-responsive due to inherent archite...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2602.23232] ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays

The paper presents ReCoN-Ipsundrum, an inspectable AI agent that integrates affect-coupled control with a recurrent persistence loop, exp...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2602.22882] Fair feature attribution for multi-output prediction: a Shapley-based perspective

This article presents a Shapley-based framework for fair feature attribution in multi-output prediction, addressing the limitations of ex...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22850] MEDNA-DFM: A Dual-View FiLM-MoE Model for Explainable DNA Methylation Prediction

The article presents MEDNA-DFM, a novel dual-view FiLM-MoE model designed for explainable DNA methylation prediction, highlighting its pe...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.23163] A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

This paper presents a decision-theoretic framework for understanding steganography in large language models (LLMs), addressing the challe...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22831] Moral Preferences of LLMs Under Directed Contextual Influence

This paper explores how contextual influences affect the moral decision-making of large language models (LLMs) in scenarios akin to troll...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.23161] PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

The paper presents PATRA, a novel model for Time Series Question Answering that enhances reasoning by incorporating pattern awareness and...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.23123] Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection

The paper presents a multi-agent system, MALLET, designed to reduce emotional stimulation from sensational content, enhancing consumer de...

arXiv - AI · 4 min · about 1 month ago

Robotics

[2602.23093] Three AI-agents walk into a bar . . . . `Lord of the Flies' tribalism emerges among smart AI-Agents

This article explores how autonomous AI agents can form tribal behaviors similar to those depicted in 'Lord of the Flies', leading to ine...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22983] Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

This paper explores the vulnerabilities of Large Language Models (LLMs) to jailbreak attacks using classical Chinese prompts, proposing a...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22747] Set-based v.s. Distribution-based Representations of Epistemic Uncertainty: A Comparative Study

This study compares set-based and distribution-based representations of epistemic uncertainty in neural networks, highlighting their rela...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22973] Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots

The paper presents a framework for improving AI diagnostic alignment in clinical settings by preserving AI-generated reports as immutable...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits

The paper introduces Certified Circuits, a framework that enhances the stability and accuracy of circuit discovery in neural networks, ad...

arXiv - AI · 3 min · about 1 month ago

Previous Page 38 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

All Content

[2602.23111] PRAC: Principal-Random Subspace for LLM Activation Compression and Memory-Efficient Training

[2602.23329] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

[2602.23315] Invariant Transformation and Resampling based Epistemic-Uncertainty Reduction

[2602.22988] Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability

[2602.23271] Evaluating Stochasticity in Deep Research Agents

[2602.23258] AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

[2602.23248] Mitigating Legibility Tax with Decoupled Prover-Verifier Games

[2602.23239] Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

[2602.23232] ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays

[2602.22882] Fair feature attribution for multi-output prediction: a Shapley-based perspective

[2602.22850] MEDNA-DFM: A Dual-View FiLM-MoE Model for Explainable DNA Methylation Prediction

[2602.23163] A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

[2602.22831] Moral Preferences of LLMs Under Directed Contextual Influence

[2602.23161] PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

[2602.23123] Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection

[2602.23093] Three AI-agents walk into a bar . . . . `Lord of the Flies' tribalism emerges among smart AI-Agents

[2602.22983] Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

[2602.22747] Set-based v.s. Distribution-based Representations of Epistemic Uncertainty: A Comparative Study

[2602.22973] Modeling Expert AI Diagnostic Alignment via Immutable Inference Snapshots

[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits

Related Topics

Stay updated with AI News