AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

The state of AI safety in four fake graphs

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min · 1 minute ago

Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min · about 13 hours ago

Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min · about 13 hours ago

All Content

Ai Safety

[2603.00078] Alignment Is Not Enough: A Relational Framework for Moral Standing in Human-AI Interaction

Abstract page for arXiv paper 2603.00078: Alignment Is Not Enough: A Relational Framework for Moral Standing in Human-AI Interaction

arXiv - AI · 4 min · 27 days ago

Machine Learning

[2603.00068] The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to Green AI

Abstract page for arXiv paper 2603.00068: The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to G...

arXiv - AI · 4 min · 27 days ago

Ai Safety

[2603.00066] Contesting Artificial Moral Agents

Abstract page for arXiv paper 2603.00066: Contesting Artificial Moral Agents

arXiv - AI · 3 min · 27 days ago

Generative Ai

[2603.00057] "Bespoke Bots": Diverse Instructor Needs for Customizing Generative AI Classroom Chatbots

Abstract page for arXiv paper 2603.00057: "Bespoke Bots": Diverse Instructor Needs for Customizing Generative AI Classroom Chatbots

arXiv - AI · 3 min · 27 days ago

Ai Safety

[2603.00047] What Is the Geometry of the Alignment Tax?

Abstract page for arXiv paper 2603.00047: What Is the Geometry of the Alignment Tax?

arXiv - Machine Learning · 3 min · 27 days ago

Llms

[2603.00042] Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment

Abstract page for arXiv paper 2603.00042: Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment

arXiv - Machine Learning · 3 min · 27 days ago

Llms

[2603.00024] Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs

Abstract page for arXiv paper 2603.00024: Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Indep...

arXiv - AI · 4 min · 27 days ago

Machine Learning

[2603.02203] Tool Verification for Test-Time Reinforcement Learning

Abstract page for arXiv paper 2603.02203: Tool Verification for Test-Time Reinforcement Learning

arXiv - AI · 3 min · 27 days ago

Machine Learning

[2603.01630] SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing

Abstract page for arXiv paper 2603.01630: SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing

arXiv - AI · 4 min · 27 days ago

Machine Learning

[2603.01620] ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents

Abstract page for arXiv paper 2603.01620: ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment...

arXiv - AI · 3 min · 27 days ago

Llms

[2603.01562] RubricBench: Aligning Model-Generated Rubrics with Human Standards

Abstract page for arXiv paper 2603.01562: RubricBench: Aligning Model-Generated Rubrics with Human Standards

arXiv - AI · 3 min · 27 days ago

Llms

[2603.01396] HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

Abstract page for arXiv paper 2603.01396: HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

arXiv - AI · 3 min · 27 days ago

Machine Learning

[2603.01290] Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

Abstract page for arXiv paper 2603.01290: Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1...

arXiv - Machine Learning · 4 min · 27 days ago

Llms

[2603.00993] CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

Abstract page for arXiv paper 2603.00993: CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

arXiv - AI · 3 min · 27 days ago

Llms

[2603.00590] Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

Abstract page for arXiv paper 2603.00590: Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

arXiv - AI · 4 min · 27 days ago

Machine Learning

[R] Are neurons the wrong primitive for modeling decision systems?

A recent ICLR paper proposes Behavior Learning — replacing neural layers with learnable constrained optimization blocks. It models it as:...

Reddit - Machine Learning · 1 min · 28 days ago

Llms

[2602.03775] An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents

Abstract page for arXiv paper 2602.03775: An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents

arXiv - AI · 4 min · 28 days ago

Machine Learning

[2502.01383] InfoBridge: Mutual Information estimation via Bridge Matching

Abstract page for arXiv paper 2502.01383: InfoBridge: Mutual Information estimation via Bridge Matching

arXiv - Machine Learning · 3 min · 28 days ago

Llms

[2509.23371] Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Abstract page for arXiv paper 2509.23371: Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and P...

arXiv - Machine Learning · 4 min · 28 days ago

Machine Learning

[2505.19441] Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems

Abstract page for arXiv paper 2505.19441: Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fai...

arXiv - Machine Learning · 3 min · 28 days ago

Previous Page 27 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

The state of AI safety in four fake graphs

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

All Content

[2603.00078] Alignment Is Not Enough: A Relational Framework for Moral Standing in Human-AI Interaction

[2603.00068] The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to Green AI

[2603.00066] Contesting Artificial Moral Agents

[2603.00057] "Bespoke Bots": Diverse Instructor Needs for Customizing Generative AI Classroom Chatbots

[2603.00047] What Is the Geometry of the Alignment Tax?

[2603.00042] Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment

[2603.00024] Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs

[2603.02203] Tool Verification for Test-Time Reinforcement Learning

[2603.01630] SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing

[2603.01620] ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents

[2603.01562] RubricBench: Aligning Model-Generated Rubrics with Human Standards

[2603.01396] HarmonyCell: Automating Single-Cell Perturbation Modeling under Semantic and Distribution Shifts

[2603.01290] Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

[2603.00993] CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

[2603.00590] Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs

[R] Are neurons the wrong primitive for modeling decision systems?

[2602.03775] An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents

[2502.01383] InfoBridge: Mutual Information estimation via Bridge Matching

[2509.23371] Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

[2505.19441] Fairness-in-the-Workflow: How Machine Learning Practitioners at Big Tech Companies Approach Fairness in Recommender Systems

Related Topics

Stay updated with AI News