AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min ·
Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·

All Content

[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
Machine Learning

[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

Abstract page for arXiv paper 2603.19308: GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

arXiv - AI · 3 min ·
[2603.19302] Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning
Llms

[2603.19302] Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning

Abstract page for arXiv paper 2603.19302: Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning

arXiv - AI · 3 min ·
[2603.19273] LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages
Llms

[2603.19273] LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages

Abstract page for arXiv paper 2603.19273: LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages

arXiv - AI · 3 min ·
Delve accused of misleading customers with ‘fake compliance’ | TechCrunch
Ai Safety

Delve accused of misleading customers with ‘fake compliance’ | TechCrunch

An anonymous Substack post accuses compliance startup Delve of “falsely” convincing “hundreds of customers they were compliant” with priv...

TechCrunch - AI · 7 min ·
Delve accused of misleading customers with ‘fake compliance’ | TechCrunch
Ai Safety

Delve accused of misleading customers with ‘fake compliance’ | TechCrunch

An anonymous Substack post accuses compliance startup Delve of “falsely” convincing “hundreds of customers they were compliant” with priv...

TechCrunch - AI · 5 min ·
Ai Safety

I built a self-evolving AI that rewrites its own rules after every session. After 62 sessions, it's most accurate when it thinks it's wrong.

NEXUS is an open-source market analysis AI that runs 3 automated sessions per day. It analyzes 45 financial instruments, generates trade ...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] Benchmark: Using XGBoost vs. DistilBERT for detecting "Month 2 Tanking" in cold email infrastructure?

I have been experimenting with Heuristic-based Deliverability Intelligence to solve the "Month 2 Tanking" problem. The Data Science Chall...

Reddit - Machine Learning · 1 min ·
Artificial intelligence in pharmaceutical manufacturing – navigating innovation and regulation
Ai Safety

Artificial intelligence in pharmaceutical manufacturing – navigating innovation and regulation

AI News - General · 2 min ·
Ethics in the Age of AI: Highlights from MSU Ethics Week 2026
Ai Safety

Ethics in the Age of AI: Highlights from MSU Ethics Week 2026

AI News - General · 4 min ·
[2510.18120] Generalization Below the Edge of Stability: The Role of Data Geometry
Machine Learning

[2510.18120] Generalization Below the Edge of Stability: The Role of Data Geometry

Abstract page for arXiv paper 2510.18120: Generalization Below the Edge of Stability: The Role of Data Geometry

arXiv - Machine Learning · 4 min ·
[2509.05609] New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR
Machine Learning

[2509.05609] New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR

Abstract page for arXiv paper 2509.05609: New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Tr...

arXiv - Machine Learning · 4 min ·
[2508.18088] How Quantization Shapes Bias in Large Language Models
Llms

[2508.18088] How Quantization Shapes Bias in Large Language Models

Abstract page for arXiv paper 2508.18088: How Quantization Shapes Bias in Large Language Models

arXiv - Machine Learning · 3 min ·
[2510.17276] Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems
Llms

[2510.17276] Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Abstract page for arXiv paper 2510.17276: Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

arXiv - Machine Learning · 4 min ·
[2509.25762] OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
Llms

[2509.25762] OPPO: Accelerating PPO-based RLHF via Pipeline Overlap

Abstract page for arXiv paper 2509.25762: OPPO: Accelerating PPO-based RLHF via Pipeline Overlap

arXiv - Machine Learning · 3 min ·
[2508.04899] Honest and Reliable Evaluation and Expert Equivalence Testing of Automated Neonatal Seizure Detection
Machine Learning

[2508.04899] Honest and Reliable Evaluation and Expert Equivalence Testing of Automated Neonatal Seizure Detection

Abstract page for arXiv paper 2508.04899: Honest and Reliable Evaluation and Expert Equivalence Testing of Automated Neonatal Seizure Det...

arXiv - Machine Learning · 4 min ·
[2412.20298] An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems
Machine Learning

[2412.20298] An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Abstract page for arXiv paper 2412.20298: An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

arXiv - Machine Learning · 4 min ·
[2603.05226] Learning Optimal Individualized Decision Rules with Conditional Demographic Parity
Ai Safety

[2603.05226] Learning Optimal Individualized Decision Rules with Conditional Demographic Parity

Abstract page for arXiv paper 2603.05226: Learning Optimal Individualized Decision Rules with Conditional Demographic Parity

arXiv - Machine Learning · 3 min ·
[2603.05157] The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis
Machine Learning

[2603.05157] The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

Abstract page for arXiv paper 2603.05157: The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

arXiv - Machine Learning · 4 min ·
[2603.04895] How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?
Machine Learning

[2603.04895] How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

Abstract page for arXiv paper 2603.04895: How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional N...

arXiv - Machine Learning · 4 min ·
[2603.04807] The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization
Machine Learning

[2603.04807] The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

Abstract page for arXiv paper 2603.04807: The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implic...

arXiv - Machine Learning · 4 min ·
Previous Page 13 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime