AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

Bias in AI: Examples and 6 Ways to Fix it in 2026

AI bias is an anomaly in the output of ML algorithms due to prejudiced assumptions. Explore types of AI bias, examples, how to reduce bia...

AI Events · 36 min ·
Llms

[R] I built a benchmark that catches LLMs breaking physics laws

I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and gra...

Reddit - Machine Learning · 1 min ·
Machine Learning

We need to teach AI the essence of being human to reduce the risk of misalignment

One part of the alignment problem is that AI does not genuinely understand what it's like to live in the world, even though it can descri...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Llms

[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

Abstract page for arXiv paper 2602.11549: Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

arXiv - Machine Learning · 4 min ·
[2601.01279] Collusive Pricing Under LLM
Llms

[2601.01279] Collusive Pricing Under LLM

Abstract page for arXiv paper 2601.01279: Collusive Pricing Under LLM

arXiv - AI · 4 min ·
[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity
Llms

[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity

Abstract page for arXiv paper 2512.03903: BERnaT: Basque Encoders for Representing Natural Textual Diversity

arXiv - AI · 3 min ·
[2512.08713] Automatic Essay Scoring and Feedback Generation in Basque Language Learning
Machine Learning

[2512.08713] Automatic Essay Scoring and Feedback Generation in Basque Language Learning

Abstract page for arXiv paper 2512.08713: Automatic Essay Scoring and Feedback Generation in Basque Language Learning

arXiv - AI · 4 min ·
[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
Llms

[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

Abstract page for arXiv paper 2511.17561: LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

arXiv - AI · 3 min ·
[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
Llms

[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

Abstract page for arXiv paper 2510.13232: What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

arXiv - AI · 4 min ·
[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
Llms

[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

Abstract page for arXiv paper 2510.02249: Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

arXiv - Machine Learning · 4 min ·
[2507.19116] Graph Structure Learning with Privacy Guarantees for Open Graph Data
Machine Learning

[2507.19116] Graph Structure Learning with Privacy Guarantees for Open Graph Data

Abstract page for arXiv paper 2507.19116: Graph Structure Learning with Privacy Guarantees for Open Graph Data

arXiv - AI · 4 min ·
[2506.13925] Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation
Llms

[2506.13925] Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation

Abstract page for arXiv paper 2506.13925: Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation

arXiv - AI · 4 min ·
[2504.09396] Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes
Machine Learning

[2504.09396] Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes

Abstract page for arXiv paper 2504.09396: Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic R...

arXiv - Machine Learning · 4 min ·
[2502.11026] RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment
Llms

[2502.11026] RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

Abstract page for arXiv paper 2502.11026: RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

arXiv - Machine Learning · 4 min ·
[2603.18908] Secure Linear Alignment of Large Language Models
Llms

[2603.18908] Secure Linear Alignment of Large Language Models

Abstract page for arXiv paper 2603.18908: Secure Linear Alignment of Large Language Models

arXiv - AI · 3 min ·
[2603.08388] A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation
Llms

[2603.08388] A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Generation

Abstract page for arXiv paper 2603.08388: A Hierarchical Error-Corrective Graph Framework for Autonomous Agents with LLM-Based Action Gen...

arXiv - AI · 4 min ·
[2603.08291] Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm
Machine Learning

[2603.08291] Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm

Abstract page for arXiv paper 2603.08291: Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reason...

arXiv - AI · 4 min ·
[2510.08713] Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight
Machine Learning

[2510.08713] Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight

Abstract page for arXiv paper 2510.08713: Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight

arXiv - AI · 4 min ·
[2412.02868] PrecLLM: A Privacy-Preserving Framework for Efficient Clinical Annotation Extraction from Unstructured EHRs using Small-Scale LLMs
Llms

[2412.02868] PrecLLM: A Privacy-Preserving Framework for Efficient Clinical Annotation Extraction from Unstructured EHRs using Small-Scale LLMs

Abstract page for arXiv paper 2412.02868: PrecLLM: A Privacy-Preserving Framework for Efficient Clinical Annotation Extraction from Unstr...

arXiv - AI · 4 min ·
[2603.22281] ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model
Llms

[2603.22281] ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Abstract page for arXiv paper 2603.22281: ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

arXiv - Machine Learning · 4 min ·
[2603.22228] SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
Machine Learning

[2603.22228] SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

Abstract page for arXiv paper 2603.22228: SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-...

arXiv - AI · 4 min ·
[2603.22042] Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models
Llms

[2603.22042] Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

Abstract page for arXiv paper 2603.22042: Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hy...

arXiv - AI · 4 min ·
[2603.21975] SecureBreak -- A dataset towards safe and secure models
Llms

[2603.21975] SecureBreak -- A dataset towards safe and secure models

Abstract page for arXiv paper 2603.21975: SecureBreak -- A dataset towards safe and secure models

arXiv - Machine Learning · 4 min ·
Previous Page 8 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime