AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min ·
Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces
Llms

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

The paper presents ACAR, a framework for adaptive complexity routing in multi-model ensembles, demonstrating improved task routing accura...

arXiv - Machine Learning · 4 min ·
[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions
Llms

[2602.21226] IslamicLegalBench: Evaluating LLMs Knowledge and Reasoning of Islamic Law Across 1,200 Years of Islamic Pluralist Legal Traditions

The paper introduces IslamicLegalBench, a benchmark for evaluating LLMs' reasoning on Islamic law, revealing significant limitations in c...

arXiv - AI · 4 min ·
[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors
Llms

[2602.21218] EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors

The paper introduces EPSVec, a novel method for generating synthetic data using dataset vectors, enhancing privacy and efficiency in mach...

arXiv - Machine Learning · 4 min ·
[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention
Nlp

[2602.21217] Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention

The paper introduces Applied Sociolinguistic AI for Community Development (ASA-CD), a paradigm that leverages AI and linguistics to addre...

arXiv - AI · 3 min ·
[2602.21215] Inference-time Alignment via Sparse Junction Steering
Llms

[2602.21215] Inference-time Alignment via Sparse Junction Steering

This paper presents Sparse Inference-time Alignment (SIA), a novel approach to enhance alignment in large language models by intervening ...

arXiv - AI · 4 min ·
[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts
Llms

[2602.22070] Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

This study explores how large language models (LLMs) exhibit inconsistent biases towards algorithmic agents and human experts in decision...

arXiv - AI · 4 min ·
[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning
Ai Agents

[2602.22094] Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

This paper presents a novel approach using Petri nets to identify infeasibilities in sequential task planning, enhancing robustness and e...

arXiv - AI · 3 min ·
[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support
Machine Learning

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

The paper presents the 2-Step Agent framework, which models the interaction between decision makers and AI decision support systems, high...

arXiv - Machine Learning · 3 min ·
[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification
Ai Safety

[2602.21857] Distill and Align Decomposition for Enhanced Claim Verification

This paper presents a novel reinforcement learning approach to enhance claim verification by optimizing decomposition quality and verifie...

arXiv - Machine Learning · 3 min ·
[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation
Machine Learning

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

The paper presents fEDM+, an enhanced fuzzy ethical decision-making framework that improves explainability and validation by integrating ...

arXiv - AI · 4 min ·
[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems
Machine Learning

[2602.21745] The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems

The ASIR Courage Model presents a phase-dynamic framework for understanding truth transitions in both human and AI systems, emphasizing t...

arXiv - AI · 4 min ·
[2602.21556] Power and Limitations of Aggregation in Compound AI Systems
Machine Learning

[2602.21556] Power and Limitations of Aggregation in Compound AI Systems

The paper explores the effectiveness of aggregating outputs from multiple AI models in compound AI systems, examining its potential to en...

arXiv - AI · 4 min ·
[2602.21534] ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
Machine Learning

[2602.21534] ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

The paper presents ARLArena, a framework designed to enhance stability in agentic reinforcement learning (ARL) by providing a systematic ...

arXiv - AI · 4 min ·
[2602.21496] Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information
Llms

[2602.21496] Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information

The paper explores the limitations of self-correction in Large Language Models (LLMs) regarding semantic sensitive information, introduci...

arXiv - AI · 3 min ·
[2602.21268] A Dynamic Survey of Soft Set Theory and Its Extensions
Machine Learning

[2602.21268] A Dynamic Survey of Soft Set Theory and Its Extensions

This article provides a comprehensive overview of soft set theory and its various extensions, highlighting key definitions, constructions...

arXiv - AI · 3 min ·
Machine Learning

[D] where can I find more information about NTK wrt Lazy and Rich learning?

The Reddit discussion seeks insights on Neural Tangent Kernel (NTK) in relation to lazy and rich learning regimes, focusing on practical ...

Reddit - Machine Learning · 1 min ·
Ai Agents

How Quickly Will A.I. Agents Rip Through the Economy?

The article features an in-depth interview with Anthropic co-founder discussing the potential impact of AI agents on the economy, explori...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

We built a cryptographic authorization gateway for AI agents and planning to run limited red-team sessions

Sentinel Gateway addresses the challenge of instruction provenance in AI agents by ensuring only user-signed prompts are treated as execu...

Reddit - Artificial Intelligence · 1 min ·
The White House wants AI companies to cover rate hikes. Most have already said they would. | TechCrunch
Ai Infrastructure

The White House wants AI companies to cover rate hikes. Most have already said they would. | TechCrunch

The White House is urging major AI companies to absorb rising electricity costs linked to their data centers. Most firms, including Micro...

TechCrunch - AI · 5 min ·
Machine Learning

[D] Is ICLR not giving Spotlights this year?

Discussion on whether ICLR is suspending Spotlights this year, with concerns over communication and potential impacts from OpenReview leaks.

Reddit - Machine Learning · 1 min ·
Previous Page 47 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime