AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

Implementing advanced AI technologies in finance | MIT Technology Review

In finance departments that have long been defined by precision and control, AI has arrived less as a neatly managed upgrade than as a qu...

MIT Technology Review - AI · 4 min · about 1 hour ago

Llms

[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

Abstract page for arXiv paper 2602.07026: Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

arXiv - AI · 4 min · about 9 hours ago

Machine Learning

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

Abstract page for arXiv paper 2511.22893: Switching-time bioprocess control with pulse-width-modulated optogenetics

arXiv - AI · 4 min · about 9 hours ago

All Content

Nlp

[2602.00474] Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

Abstract page for arXiv paper 2602.00474: Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

arXiv - Machine Learning · 4 min · about 10 hours ago

Llms

[2512.23927] Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Abstract page for arXiv paper 2512.23927: Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

arXiv - Machine Learning · 4 min · about 10 hours ago

Machine Learning

[2512.23032] Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

Abstract page for arXiv paper 2512.23032: Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Ve...

arXiv - AI · 4 min · about 10 hours ago

Llms

[2410.21438] UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

Abstract page for arXiv paper 2410.21438: UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

arXiv - Machine Learning · 4 min · about 10 hours ago

Ai Safety

[2602.10512] Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers

Abstract page for arXiv paper 2602.10512: Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers

arXiv - Machine Learning · 4 min · about 10 hours ago

Machine Learning

[2602.01642] The Effect of Mini-Batch Noise on the Implicit Bias of Adam

Abstract page for arXiv paper 2602.01642: The Effect of Mini-Batch Noise on the Implicit Bias of Adam

arXiv - AI · 4 min · about 10 hours ago

Machine Learning

[2512.23770] SB-TRPO: Towards Safe Reinforcement Learning with Hard Constraints

Abstract page for arXiv paper 2512.23770: SB-TRPO: Towards Safe Reinforcement Learning with Hard Constraints

arXiv - AI · 3 min · about 10 hours ago

Machine Learning

[2510.00253] DReS: Dual Reconstruction Smoothing for Functional Regularization

Abstract page for arXiv paper 2510.00253: DReS: Dual Reconstruction Smoothing for Functional Regularization

arXiv - Machine Learning · 3 min · about 10 hours ago

Llms

[2408.15339] UNA: A Unified Supervised Framework for Efficient LLM Alignment Across Feedback Types

Abstract page for arXiv paper 2408.15339: UNA: A Unified Supervised Framework for Efficient LLM Alignment Across Feedback Types

arXiv - Machine Learning · 4 min · about 10 hours ago

Machine Learning

[2605.07970] Linear Response Estimators for Singular Statistical Models

Abstract page for arXiv paper 2605.07970: Linear Response Estimators for Singular Statistical Models

arXiv - Machine Learning · 3 min · about 10 hours ago

Machine Learning

[2605.07665] Debiased Counterfactual Generation via Flow Matching from Observations

Abstract page for arXiv paper 2605.07665: Debiased Counterfactual Generation via Flow Matching from Observations

arXiv - Machine Learning · 3 min · about 10 hours ago

Llms

[2605.07632] Post-training makes large language models less human-like

Abstract page for arXiv paper 2605.07632: Post-training makes large language models less human-like

arXiv - AI · 4 min · about 10 hours ago

Nlp

[2605.07409] The Proxy Presumption: From Semantic Embeddings to Valid Social Measures

Abstract page for arXiv paper 2605.07409: The Proxy Presumption: From Semantic Embeddings to Valid Social Measures

arXiv - Machine Learning · 3 min · about 10 hours ago

Llms

[2605.07324] Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

Abstract page for arXiv paper 2605.07324: Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

arXiv - AI · 4 min · about 10 hours ago

Ai Safety

[2605.07263] Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

Abstract page for arXiv paper 2605.07263: Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

arXiv - AI · 4 min · about 10 hours ago

Machine Learning

[2605.07100] TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

Abstract page for arXiv paper 2605.07100: TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

arXiv - Machine Learning · 3 min · about 10 hours ago

Machine Learning

[2605.07065] Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks

Abstract page for arXiv paper 2605.07065: Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neur...

arXiv - AI · 3 min · about 10 hours ago

Computer Vision

[2605.06891] Towards Fairness under Label Bias in Image Segmentation: Impact, Measurement and Mitigation

Abstract page for arXiv paper 2605.06891: Towards Fairness under Label Bias in Image Segmentation: Impact, Measurement and Mitigation

arXiv - Machine Learning · 3 min · about 10 hours ago

Ai Safety

[2605.06696] Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

Abstract page for arXiv paper 2605.06696: Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

arXiv - AI · 4 min · about 10 hours ago

Machine Learning

[2605.06672] More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

Abstract page for arXiv paper 2605.06672: More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

arXiv - AI · 4 min · about 10 hours ago

Previous Page 2 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Implementing advanced AI technologies in finance | MIT Technology Review

[2602.07026] Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

[2511.22893] Switching-time bioprocess control with pulse-width-modulated optogenetics

All Content

[2602.00474] Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

[2512.23927] Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

[2512.23032] Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

[2410.21438] UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

[2602.10512] Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers

[2602.01642] The Effect of Mini-Batch Noise on the Implicit Bias of Adam

[2512.23770] SB-TRPO: Towards Safe Reinforcement Learning with Hard Constraints

[2510.00253] DReS: Dual Reconstruction Smoothing for Functional Regularization

[2408.15339] UNA: A Unified Supervised Framework for Efficient LLM Alignment Across Feedback Types

[2605.07970] Linear Response Estimators for Singular Statistical Models

[2605.07665] Debiased Counterfactual Generation via Flow Matching from Observations

[2605.07632] Post-training makes large language models less human-like

[2605.07409] The Proxy Presumption: From Semantic Embeddings to Valid Social Measures

[2605.07324] Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

[2605.07263] Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

[2605.07100] TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

[2605.07065] Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks

[2605.06891] Towards Fairness under Label Bias in Image Segmentation: Impact, Measurement and Mitigation

[2605.06696] Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

[2605.06672] More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

Related Topics

Stay updated with AI News