AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min ·
Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·

All Content

[2511.21104] BRIDGE: Building Representations In Domain Guided Program Synthesis
Llms

[2511.21104] BRIDGE: Building Representations In Domain Guided Program Synthesis

The paper presents BRIDGE, a framework for improving program synthesis through structured prompting, enhancing correctness and efficiency...

arXiv - Machine Learning · 4 min ·
[2512.02435] Efficient Cross-Domain Offline Reinforcement Learning with Dynamics- and Value-Aligned Data Filtering
Nlp

[2512.02435] Efficient Cross-Domain Offline Reinforcement Learning with Dynamics- and Value-Aligned Data Filtering

This paper presents a novel framework for cross-domain offline reinforcement learning, introducing a method that filters data based on bo...

arXiv - Machine Learning · 4 min ·
[2511.07922] SERL: Self-Examining Reinforcement Learning on Open-Domain
Llms

[2511.07922] SERL: Self-Examining Reinforcement Learning on Open-Domain

The paper introduces Self-Examining Reinforcement Learning (SERL), a novel framework that enhances the performance of large language mode...

arXiv - Machine Learning · 4 min ·
[2510.10625] ImpMIA: Leveraging Implicit Bias for Membership Inference Attack
Machine Learning

[2510.10625] ImpMIA: Leveraging Implicit Bias for Membership Inference Attack

The paper introduces ImpMIA, a novel Membership Inference Attack that leverages implicit bias in neural networks to identify training sam...

arXiv - Machine Learning · 4 min ·
[2507.12652] Federated Learning in Offline and Online EMG Decoding: A Privacy and Performance Perspective
Machine Learning

[2507.12652] Federated Learning in Offline and Online EMG Decoding: A Privacy and Performance Perspective

This article explores the application of federated learning (FL) in offline and online EMG decoding, addressing privacy and performance c...

arXiv - Machine Learning · 4 min ·
[2411.09847] Towards a Fairer Non-negative Matrix Factorization
Machine Learning

[2411.09847] Towards a Fairer Non-negative Matrix Factorization

This article presents a novel approach to Non-negative Matrix Factorization (NMF) aimed at improving fairness in machine learning algorit...

arXiv - Machine Learning · 4 min ·
[2602.22115] Slice and Explain: Logic-Based Explanations for Neural Networks through Domain Slicing
Machine Learning

[2602.22115] Slice and Explain: Logic-Based Explanations for Neural Networks through Domain Slicing

The paper presents a novel approach called 'Slice and Explain,' which utilizes domain slicing to enhance the efficiency of logic-based ex...

arXiv - Machine Learning · 3 min ·
[2602.22083] Coarsening Bias from Variable Discretization in Causal Functionals
Machine Learning

[2602.22083] Coarsening Bias from Variable Discretization in Causal Functionals

This paper discusses the coarsening bias introduced by discretizing continuous variables in causal functionals, proposing a bias-reduced ...

arXiv - Machine Learning · 3 min ·
[2602.21957] Learning to Collaborate via Structures: Cluster-Guided Item Alignment for Federated Recommendation
Machine Learning

[2602.21957] Learning to Collaborate via Structures: Cluster-Guided Item Alignment for Federated Recommendation

The paper presents CGFedRec, a novel framework for federated recommendation that enhances collaboration by using cluster-guided item alig...

arXiv - Machine Learning · 4 min ·
[2602.21873] GFPL: Generative Federated Prototype Learning for Resource-Constrained and Data-Imbalanced Vision Task
Machine Learning

[2602.21873] GFPL: Generative Federated Prototype Learning for Resource-Constrained and Data-Imbalanced Vision Task

The GFPL framework enhances federated learning by addressing data imbalance and communication overhead in resource-constrained vision tas...

arXiv - Machine Learning · 4 min ·
[2602.21721] Private and Robust Contribution Evaluation in Federated Learning
Machine Learning

[2602.21721] Private and Robust Contribution Evaluation in Federated Learning

This paper presents novel methods for evaluating contributions in federated learning while ensuring privacy and robustness, addressing vu...

arXiv - Machine Learning · 4 min ·
[2602.21509] Fair Model-based Clustering
Machine Learning

[2602.21509] Fair Model-based Clustering

The paper presents Fair Model-based Clustering (FMC), a new algorithm that enhances fairness in clustering by ensuring the proportion of ...

arXiv - Machine Learning · 3 min ·
[2602.21272] Counterdiabatic Hamiltonian Monte Carlo
Ai Safety

[2602.21272] Counterdiabatic Hamiltonian Monte Carlo

The paper introduces Counterdiabatic Hamiltonian Monte Carlo (CHMC), an advanced sampling method that improves the efficiency of Hamilton...

arXiv - Machine Learning · 3 min ·
[2602.21262] Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models
Llms

[2602.21262] Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

This paper investigates the interplay between persuasion and vigilance in Large Language Models (LLMs), revealing that these capacities a...

arXiv - Machine Learning · 4 min ·
[2602.21252] INTACT: Intent-Aware Representation Learning for Cryptographic Traffic Violation Detection
Machine Learning

[2602.21252] INTACT: Intent-Aware Representation Learning for Cryptographic Traffic Violation Detection

The paper introduces INTACT, a novel framework for detecting cryptographic traffic violations by modeling violations as conditional const...

arXiv - Machine Learning · 3 min ·
[2602.21212] Disaster Question Answering with LoRA Efficiency and Accurate End Position
Nlp

[2602.21212] Disaster Question Answering with LoRA Efficiency and Accurate End Position

This paper presents a disaster-focused question answering system optimized for Japanese disaster scenarios, achieving high accuracy with ...

arXiv - Machine Learning · 4 min ·
[2602.21961] Robustness in sparse artificial neural networks trained with adaptive topology
Machine Learning

[2602.21961] Robustness in sparse artificial neural networks trained with adaptive topology

This paper explores the robustness of sparse artificial neural networks with adaptive topology, demonstrating their competitive performan...

arXiv - Machine Learning · 3 min ·
[2602.21928] Learning Unknown Interdependencies for Decentralized Root Cause Analysis in Nonlinear Dynamical Systems
Machine Learning

[2602.21928] Learning Unknown Interdependencies for Decentralized Root Cause Analysis in Nonlinear Dynamical Systems

This paper presents a novel federated learning methodology for decentralized root cause analysis in nonlinear dynamical systems, addressi...

arXiv - Machine Learning · 4 min ·
[2602.21844] JSAM: Privacy Straggler-Resilient Joint Client Selection and Incentive Mechanism Design in Differentially Private Federated Learning
Machine Learning

[2602.21844] JSAM: Privacy Straggler-Resilient Joint Client Selection and Incentive Mechanism Design in Differentially Private Federated Learning

The paper presents JSAM, a framework for optimizing client selection and privacy compensation in differentially private federated learnin...

arXiv - Machine Learning · 4 min ·
[2602.21750] From Words to Amino Acids: Does the Curse of Depth Persist?
Llms

[2602.21750] From Words to Amino Acids: Does the Curse of Depth Persist?

This paper explores the depth inefficiency in protein language models (PLMs), revealing that later layers contribute less to output predi...

arXiv - Machine Learning · 4 min ·
Previous Page 43 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime