AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min · about 13 hours ago

Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min · about 13 hours ago

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 18 hours ago

All Content

Llms

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

The paper presents ACAR, a framework for adaptive complexity routing in multi-model ensembles, demonstrating improved task routing accura...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

The paper introduces IslamicLegalBench, a benchmark for evaluating LLMs' reasoning on Islamic law, revealing significant limitations in c...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors

The paper introduces EPSVec, a novel method for generating synthetic data using dataset vectors, enhancing privacy and efficiency in mach...

arXiv - Machine Learning · 4 min · about 1 month ago

Nlp

[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention

The paper introduces Applied Sociolinguistic AI for Community Development (ASA-CD), a paradigm that leverages AI and linguistics to addre...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21215] Inference-time Alignment via Sparse Junction Steering

This paper presents Sparse Inference-time Alignment (SIA), a novel approach to enhance alignment in large language models by intervening ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

This study explores how large language models (LLMs) exhibit inconsistent biases towards algorithmic agents and human experts in decision...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

This paper presents a novel approach using Petri nets to identify infeasibilities in sequential task planning, enhancing robustness and e...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

The paper presents the 2-Step Agent framework, which models the interaction between decision makers and AI decision support systems, high...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification

This paper presents a novel reinforcement learning approach to enhance claim verification by optimizing decomposition quality and verifie...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

The paper presents fEDM+, an enhanced fuzzy ethical decision-making framework that improves explainability and validation by integrating ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems

The ASIR Courage Model presents a phase-dynamic framework for understanding truth transitions in both human and AI systems, emphasizing t...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21556] Power and Limitations of Aggregation in Compound AI Systems

The paper explores the effectiveness of aggregating outputs from multiple AI models in compound AI systems, examining its potential to en...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21534] ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

The paper presents ARLArena, a framework designed to enhance stability in agentic reinforcement learning (ARL) by providing a systematic ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.21496] Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

The paper explores the limitations of self-correction in Large Language Models (LLMs) regarding semantic sensitive information, introduci...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21268] A Dynamic Survey of Soft Set Theory and Its Extensions

This article provides a comprehensive overview of soft set theory and its various extensions, highlighting key definitions, constructions...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[D] where can I find more information about NTK wrt Lazy and Rich learning?

The Reddit discussion seeks insights on Neural Tangent Kernel (NTK) in relation to lazy and rich learning regimes, focusing on practical ...

Reddit - Machine Learning · 1 min · about 1 month ago

Ai Agents

How Quickly Will A.I. Agents Rip Through the Economy?

The article features an in-depth interview with Anthropic co-founder discussing the potential impact of AI agents on the economy, explori...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Machine Learning

We built a cryptographic authorization gateway for AI agents and planning to run limited red-team sessions

Sentinel Gateway addresses the challenge of instruction provenance in AI agents by ensuring only user-signed prompts are treated as execu...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Infrastructure

The White House wants AI companies to cover rate hikes. Most have already said they would. | TechCrunch

The White House is urging major AI companies to absorb rising electricity costs linked to their data centers. Most firms, including Micro...

TechCrunch - AI · 5 min · about 1 month ago

Machine Learning

[D] Is ICLR not giving Spotlights this year?

Discussion on whether ICLR is suspending Spotlights this year, with concerns over communication and potential impacts from OpenReview leaks.

Reddit - Machine Learning · 1 min · about 1 month ago

Previous Page 47 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[D] I had an idea, would love your thoughts

I had an idea, would love your thoughts

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

All Content

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors

[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention

[2602.21215] Inference-time Alignment via Sparse Junction Steering

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems

[2602.21556] Power and Limitations of Aggregation in Compound AI Systems

[2602.21534] ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

[2602.21496] Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

[2602.21268] A Dynamic Survey of Soft Set Theory and Its Extensions

[D] where can I find more information about NTK wrt Lazy and Rich learning?

How Quickly Will A.I. Agents Rip Through the Economy?

We built a cryptographic authorization gateway for AI agents and planning to run limited red-team sessions

The White House wants AI companies to cover rate hikes. Most have already said they would. | TechCrunch

[D] Is ICLR not giving Spotlights this year?

Related Topics

Stay updated with AI News