AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min · about 8 hours ago

Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 13 hours ago

All Content

Machine Learning

[2602.21773] Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

This paper discusses the challenges of machine unlearning in the presence of biased data, introducing a novel framework called CUPID to e...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21693] TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts

The paper introduces TiMi, a novel approach that enhances time series forecasting by integrating multimodal data through a Mixture of Exp...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21648] Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction

This article presents a multimodal machine learning framework for predicting 5-year breast cancer survival, integrating clinical and geno...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21593] Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

The paper introduces a novel attack method, Coherence-Preserving Semantic Injection (CSI), that exploits vulnerabilities in semantic-awar...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21508] WaterVIB: Learning Minimal Sufficient Watermark Representations via Variational Information Bottleneck

The paper introduces WaterVIB, a framework for robust watermarking that utilizes the Variational Information Bottleneck to enhance resili...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21467] Geometric Priors for Generalizable World Models via Vector Symbolic Architecture

This article presents a novel approach to world modeling in AI using Vector Symbolic Architecture (VSA) to enhance generalization and int...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21426] Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators

The paper introduces Proximal-IMH, a novel sampling method for Bayesian inverse problems that enhances the efficiency of the Independent ...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21390] Defensive Generation

The paper 'Defensive Generation' presents a novel approach to creating generative models that are unfalsifiable based on observed data, e...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.21297] Robust AI Evaluation through Maximal Lotteries

The paper proposes a new method for evaluating AI models using robust lotteries, addressing limitations of traditional pairwise compariso...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.10359] Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT

This study evaluates the performance of foundation models in detecting abdominal trauma, revealing that specificity deficits are influenc...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.09929] Monocular Normal Estimation via Shading Sequence Estimation

This paper presents a novel approach to monocular normal estimation by reformulating the problem as shading sequence estimation, enhancin...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.05066] Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

The paper discusses vulnerabilities in AI control protocols, specifically how Agent-as-a-Proxy attacks can bypass existing monitoring def...

arXiv - AI · 3 min · about 1 month ago

Llms

[2601.17064] Between Search and Platform: ChatGPT Under the DSA

This article analyzes the classification of ChatGPT under the Digital Services Act (DSA), proposing it as a hybrid of search engine and p...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

This article explores the gaps in understanding superintelligence misalignment, emphasizing the absence of the human subject and the impl...

arXiv - AI · 4 min · about 1 month ago

Llms

[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation

The paper presents EARL, an Entropy-Aware Reinforcement Learning framework designed to enhance the reliability of RTL code generation by ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.25184] Incentive-Aligned Multi-Source LLM Summaries

The paper presents an innovative framework called Truthful Text Summarization (TTS) aimed at enhancing the factual accuracy of multi-sour...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

This paper argues for a shift in machine learning fairness research to focus on structural injustice through social determinants, rather ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2507.08017] Mechanistic Indicators of Understanding in Large Language Models

This paper explores mechanistic indicators of understanding in large language models (LLMs), proposing a tiered framework to assess their...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series

This article presents a comprehensive benchmark for electrocardiogram (ECG) time-series analysis, highlighting its unique characteristics...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack

This paper introduces a novel attack and auditing framework for Vertical Federated Learning (VFL), addressing vulnerabilities in inferenc...

arXiv - AI · 4 min · about 1 month ago

Previous Page 44 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[D] I had an idea, would love your thoughts

I had an idea, would love your thoughts

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

All Content

[2602.21773] Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

[2602.21693] TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts

[2602.21648] Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction

[2602.21593] Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

[2602.21508] WaterVIB: Learning Minimal Sufficient Watermark Representations via Variational Information Bottleneck

[2602.21467] Geometric Priors for Generalizable World Models via Vector Symbolic Architecture

[2602.21426] Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators

[2602.21390] Defensive Generation

[2602.21297] Robust AI Evaluation through Maximal Lotteries

[2602.10359] Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT

[2602.09929] Monocular Normal Estimation via Shading Sequence Estimation

[2602.05066] Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

[2601.17064] Between Search and Platform: ChatGPT Under the DSA

[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation

[2509.25184] Incentive-Aligned Multi-Source LLM Summaries

[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

[2507.08017] Mechanistic Indicators of Understanding in Large Language Models

[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series

[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack

Related Topics

Stay updated with AI News