AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min · about 3 hours ago

Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min · about 9 hours ago

Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min · 1 day ago

All Content

Llms

[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

Abstract page for arXiv paper 2602.11549: Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2601.01279] Collusive Pricing Under LLM

Abstract page for arXiv paper 2601.01279: Collusive Pricing Under LLM

arXiv - AI · 4 min · 5 days ago

Llms

[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity

Abstract page for arXiv paper 2512.03903: BERnaT: Basque Encoders for Representing Natural Textual Diversity

arXiv - AI · 3 min · 5 days ago

Machine Learning

[2512.08713] Automatic Essay Scoring and Feedback Generation in Basque Language Learning

Abstract page for arXiv paper 2512.08713: Automatic Essay Scoring and Feedback Generation in Basque Language Learning

arXiv - AI · 4 min · 5 days ago

Llms

[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

Abstract page for arXiv paper 2511.17561: LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

arXiv - AI · 3 min · 5 days ago

Llms

[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

Abstract page for arXiv paper 2510.13232: What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

arXiv - AI · 4 min · 5 days ago

Llms

[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

Abstract page for arXiv paper 2510.02249: Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

arXiv - Machine Learning · 4 min · 5 days ago

Machine Learning

[2507.19116] Graph Structure Learning with Privacy Guarantees for Open Graph Data

Abstract page for arXiv paper 2507.19116: Graph Structure Learning with Privacy Guarantees for Open Graph Data

arXiv - AI · 4 min · 5 days ago

Llms

[2506.13925] Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation

Abstract page for arXiv paper 2506.13925: Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2504.09396] Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes

Abstract page for arXiv paper 2504.09396: Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic R...

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2502.11026] RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

Abstract page for arXiv paper 2502.11026: RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2603.18908] Secure Linear Alignment of Large Language Models

Abstract page for arXiv paper 2603.18908: Secure Linear Alignment of Large Language Models

arXiv - AI · 3 min · 5 days ago

Llms

[2603.08388] A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation

Abstract page for arXiv paper 2603.08388: A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Gen...

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2603.08291] Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm

Abstract page for arXiv paper 2603.08291: Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reason...

arXiv - AI · 4 min · 5 days ago

Machine Learning

[2510.08713] Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight

Abstract page for arXiv paper 2510.08713: Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight

arXiv - AI · 4 min · 5 days ago

Llms

[2412.02868] PrecLLM: A Privacy-Preserving Framework for Efficient Clinical Annotation Extraction from Unstructured EHRs using Small-Scale LLMs

Abstract page for arXiv paper 2412.02868: PrecLLM: A Privacy-Preserving Framework for Efficient Clinical Annotation Extraction from Unstr...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.22281] ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Abstract page for arXiv paper 2603.22281: ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

arXiv - Machine Learning · 4 min · 5 days ago

Machine Learning

[2603.22228] SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

Abstract page for arXiv paper 2603.22228: SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.22042] Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

Abstract page for arXiv paper 2603.22042: Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hy...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.21975] SecureBreak -- A dataset towards safe and secure models

Abstract page for arXiv paper 2603.21975: SecureBreak -- A dataset towards safe and secure models

arXiv - Machine Learning · 4 min · 5 days ago

Previous Page 8 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Bias in AI: Examples and 6 Ways to Fix it in 2026

[R] I built a benchmark that catches LLMs breaking physics laws

We need to teach AI the essence of being human to reduce the risk of misalignment

All Content

[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

[2601.01279] Collusive Pricing Under LLM

[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity

[2512.08713] Automatic Essay Scoring and Feedback Generation in Basque Language Learning

[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

[2507.19116] Graph Structure Learning with Privacy Guarantees for Open Graph Data

[2506.13925] Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation

[2504.09396] Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes

[2502.11026] RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

[2603.18908] Secure Linear Alignment of Large Language Models

[2603.08388] A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation

[2603.08291] Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm

[2510.08713] Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight

[2412.02868] PrecLLM: A Privacy-Preserving Framework for Efficient Clinical Annotation Extraction from Unstructured EHRs using Small-Scale LLMs

[2603.22281] ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

[2603.22228] SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

[2603.22042] Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

[2603.21975] SecureBreak -- A dataset towards safe and secure models

Related Topics

Stay updated with AI News