AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[R] I trained a 3k parameter model on XOR sequences of length 20. It extrapolates perfectly to length 1,000,000. Here's why I think that's architecturally significant.

I've been working on an alternative to attention-based sequence modeling that I'm calling Geometric Flow Networks (GFN). The core idea: i...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min · about 5 hours ago

Ai Safety

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

I’ve written an essay exploring what I’m calling the Super-Intelligent Octopus Problem—a thought experiment designed to surface a paradox...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

All Content

Ai Safety

‘Uncanny Valley’: Iran War in the AI Era, Prediction Market Ethics, and Paramount Beats Netflix | WIRED

In this episode, our hosts unpack the ongoing conflict in the Middle East, particularly as the AI industry has been entrenching itself wi...

Wired - AI · 35 min · 24 days ago

Ai Safety

Anthropic’s AI safety stance clashes with Pentagon – and reshapes spending on primaries

AI Tools & Products · 25 days ago

Llms

[2512.15792] A Systematic Analysis of Biases in Large Language Models

Abstract page for arXiv paper 2512.15792: A Systematic Analysis of Biases in Large Language Models

arXiv - AI · 3 min · 25 days ago

Ai Safety

[2511.14827] Implicit Bias of the JKO Scheme

Abstract page for arXiv paper 2511.14827: Implicit Bias of the JKO Scheme

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2506.18703] Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

Abstract page for arXiv paper 2506.18703: Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

arXiv - Machine Learning · 3 min · 25 days ago

Llms

[2510.10889] Topological Alignment of Shared Vision-Language Embedding Space

Abstract page for arXiv paper 2510.10889: Topological Alignment of Shared Vision-Language Embedding Space

arXiv - AI · 3 min · 25 days ago

Machine Learning

[2510.08580] LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection

Abstract page for arXiv paper 2510.08580: LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection

arXiv - AI · 4 min · 25 days ago

Llms

[2412.19436] Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

Abstract page for arXiv paper 2412.19436: Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

arXiv - Machine Learning · 3 min · 25 days ago

Llms

[2505.23783] Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

Abstract page for arXiv paper 2505.23783: Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2601.17204] SpecBridge: Bridging Mass Spectrometry and Molecular Representations via Cross-Modal Alignment

Abstract page for arXiv paper 2601.17204: SpecBridge: Bridging Mass Spectrometry and Molecular Representations via Cross-Modal Alignment

arXiv - Machine Learning · 4 min · 25 days ago

Llms

[2503.07885] Safety Guardrails for LLM-Enabled Robots

Abstract page for arXiv paper 2503.07885: Safety Guardrails for LLM-Enabled Robots

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2511.16849] Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks

Abstract page for arXiv paper 2511.16849: Better audio representations are more brain-like: linking model-brain alignment with performanc...

arXiv - Machine Learning · 4 min · 25 days ago

Machine Learning

[2510.26303] Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

Abstract page for arXiv paper 2510.26303: Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

arXiv - AI · 4 min · 25 days ago

Llms

[2510.15982] AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution

Abstract page for arXiv paper 2510.15982: AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution

arXiv - AI · 4 min · 25 days ago

Llms

[2509.22263] Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

Abstract page for arXiv paper 2509.22263: Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

arXiv - Machine Learning · 3 min · 25 days ago

Llms

[2510.26905] Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

Abstract page for arXiv paper 2510.26905: Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

arXiv - AI · 4 min · 25 days ago

Llms

[2505.20065] SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

Abstract page for arXiv paper 2505.20065: SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

arXiv - AI · 4 min · 25 days ago

Llms

[2507.15796] From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

Abstract page for arXiv paper 2507.15796: From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Lea...

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2411.15272] Curriculum-enhanced GroupDRO: Challenging the Norm of Avoiding Curriculum Learning in Subpopulation Shift Setups

Abstract page for arXiv paper 2411.15272: Curriculum-enhanced GroupDRO: Challenging the Norm of Avoiding Curriculum Learning in Subpopula...

arXiv - AI · 3 min · 25 days ago

Ai Safety

[2603.04383] Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

Abstract page for arXiv paper 2603.04383: Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Inf...

arXiv - Machine Learning · 4 min · 25 days ago

Previous Page 16 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[R] I trained a 3k parameter model on XOR sequences of length 20. It extrapolates perfectly to length 1,000,000. Here's why I think that's architecturally significant.

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

I’ve come up with a new thought experiment to approach ASI, and it challenges the very notions of alignment and containment

All Content

‘Uncanny Valley’: Iran War in the AI Era, Prediction Market Ethics, and Paramount Beats Netflix | WIRED

Anthropic’s AI safety stance clashes with Pentagon – and reshapes spending on primaries

[2512.15792] A Systematic Analysis of Biases in Large Language Models

[2511.14827] Implicit Bias of the JKO Scheme

[2506.18703] Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

[2510.10889] Topological Alignment of Shared Vision-Language Embedding Space

[2510.08580] LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection

[2412.19436] Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback

[2505.23783] Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

[2601.17204] SpecBridge: Bridging Mass Spectrometry and Molecular Representations via Cross-Modal Alignment

[2503.07885] Safety Guardrails for LLM-Enabled Robots

[2511.16849] Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks

[2510.26303] Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

[2510.15982] AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution

[2509.22263] Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

[2510.26905] Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

[2505.20065] SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

[2507.15796] From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

[2411.15272] Curriculum-enhanced GroupDRO: Challenging the Norm of Avoiding Curriculum Learning in Subpopulation Shift Setups

[2603.04383] Turning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

Related Topics

Stay updated with AI News