AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 11 hours ago

All Content

Machine Learning

[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

Abstract page for arXiv paper 2603.19308: GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

arXiv - AI · 3 min · 7 days ago

Llms

[2603.19302] Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning

Abstract page for arXiv paper 2603.19302: Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning

arXiv - AI · 3 min · 7 days ago

Llms

[2603.19273] LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages

Abstract page for arXiv paper 2603.19273: LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages

arXiv - AI · 3 min · 7 days ago

Ai Safety

Delve accused of misleading customers with ‘fake compliance’ | TechCrunch

An anonymous Substack post accuses compliance startup Delve of “falsely” convincing “hundreds of customers they were compliant” with priv...

TechCrunch - AI · 7 min · 7 days ago

Ai Safety

Delve accused of misleading customers with ‘fake compliance’ | TechCrunch

An anonymous Substack post accuses compliance startup Delve of “falsely” convincing “hundreds of customers they were compliant” with priv...

TechCrunch - AI · 5 min · 8 days ago

Ai Safety

I built a self-evolving AI that rewrites its own rules after every session. After 62 sessions, it's most accurate when it thinks it's wrong.

NEXUS is an open-source market analysis AI that runs 3 automated sessions per day. It analyzes 45 financial instruments, generates trade ...

Reddit - Artificial Intelligence · 1 min · 8 days ago

Machine Learning

[P] Benchmark: Using XGBoost vs. DistilBERT for detecting "Month 2 Tanking" in cold email infrastructure?

I have been experimenting with Heuristic-based Deliverability Intelligence to solve the "Month 2 Tanking" problem. The Data Science Chall...

Reddit - Machine Learning · 1 min · 8 days ago

Ai Safety

Artificial intelligence in pharmaceutical manufacturing – navigating innovation and regulation

AI News - General · 2 min · 23 days ago

Ai Safety

Ethics in the Age of AI: Highlights from MSU Ethics Week 2026

AI News - General · 4 min · 23 days ago

Machine Learning

[2510.18120] Generalization Below the Edge of Stability: The Role of Data Geometry

Abstract page for arXiv paper 2510.18120: Generalization Below the Edge of Stability: The Role of Data Geometry

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2509.05609] New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR

Abstract page for arXiv paper 2509.05609: New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Tr...

arXiv - Machine Learning · 4 min · 24 days ago

Llms

[2508.18088] How Quantization Shapes Bias in Large Language Models

Abstract page for arXiv paper 2508.18088: How Quantization Shapes Bias in Large Language Models

arXiv - Machine Learning · 3 min · 24 days ago

Llms

[2510.17276] Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Abstract page for arXiv paper 2510.17276: Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

arXiv - Machine Learning · 4 min · 24 days ago

Llms

[2509.25762] OPPO: Accelerating PPO-based RLHF via Pipeline Overlap

Abstract page for arXiv paper 2509.25762: OPPO: Accelerating PPO-based RLHF via Pipeline Overlap

arXiv - Machine Learning · 3 min · 24 days ago

Machine Learning

[2508.04899] Honest and Reliable Evaluation and Expert Equivalence Testing of Automated Neonatal Seizure Detection

Abstract page for arXiv paper 2508.04899: Honest and Reliable Evaluation and Expert Equivalence Testing of Automated Neonatal Seizure Det...

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2412.20298] An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Abstract page for arXiv paper 2412.20298: An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

arXiv - Machine Learning · 4 min · 24 days ago

Ai Safety

[2603.05226] Learning Optimal Individualized Decision Rules with Conditional Demographic Parity

Abstract page for arXiv paper 2603.05226: Learning Optimal Individualized Decision Rules with Conditional Demographic Parity

arXiv - Machine Learning · 3 min · 24 days ago

Machine Learning

[2603.05157] The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

Abstract page for arXiv paper 2603.05157: The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2603.04895] How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

Abstract page for arXiv paper 2603.04895: How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional N...

arXiv - Machine Learning · 4 min · 24 days ago

Machine Learning

[2603.04807] The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

Abstract page for arXiv paper 2603.04807: The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implic...

arXiv - Machine Learning · 4 min · 24 days ago

Previous Page 13 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

Bias in AI: Examples and 6 Ways to Fix it in 2026

All Content

[2603.19308] GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

[2603.19302] Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning

[2603.19273] LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages

Delve accused of misleading customers with ‘fake compliance’ | TechCrunch

Delve accused of misleading customers with ‘fake compliance’ | TechCrunch

I built a self-evolving AI that rewrites its own rules after every session. After 62 sessions, it's most accurate when it thinks it's wrong.

[P] Benchmark: Using XGBoost vs. DistilBERT for detecting "Month 2 Tanking" in cold email infrastructure?

Artificial intelligence in pharmaceutical manufacturing – navigating innovation and regulation

Ethics in the Age of AI: Highlights from MSU Ethics Week 2026

[2510.18120] Generalization Below the Edge of Stability: The Role of Data Geometry

[2509.05609] New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR

[2508.18088] How Quantization Shapes Bias in Large Language Models

[2510.17276] Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

[2509.25762] OPPO: Accelerating PPO-based RLHF via Pipeline Overlap

[2508.04899] Honest and Reliable Evaluation and Expert Equivalence Testing of Automated Neonatal Seizure Detection

[2412.20298] An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

[2603.05226] Learning Optimal Individualized Decision Rules with Conditional Demographic Parity

[2603.05157] The Impact of Preprocessing Methods on Racial Encoding and Model Robustness in CXR Diagnosis

[2603.04895] How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

[2603.04807] The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

Related Topics

Stay updated with AI News