AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Machine Learning

[D] I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Machine Learning · 1 min ·
Machine Learning

I had an idea, would love your thoughts

What happens that while training an AI during pre training we make it such that if makes "misaligned behaviour" then we just reduce like ...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.21773] Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias
Machine Learning

[2602.21773] Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

This paper discusses the challenges of machine unlearning in the presence of biased data, introducing a novel framework called CUPID to e...

arXiv - Machine Learning · 4 min ·
[2602.21693] TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts
Machine Learning

[2602.21693] TiMi: Empower Time Series Transformers with Multimodal Mixture of Experts

The paper introduces TiMi, a novel approach that enhances time series forecasting by integrating multimodal data through a Mixture of Exp...

arXiv - Machine Learning · 4 min ·
[2602.21648] Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction
Machine Learning

[2602.21648] Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction

This article presents a multimodal machine learning framework for predicting 5-year breast cancer survival, integrating clinical and geno...

arXiv - Machine Learning · 4 min ·
[2602.21593] Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection
Llms

[2602.21593] Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

The paper introduces a novel attack method, Coherence-Preserving Semantic Injection (CSI), that exploits vulnerabilities in semantic-awar...

arXiv - Machine Learning · 4 min ·
[2602.21508] WaterVIB: Learning Minimal Sufficient Watermark Representations via Variational Information Bottleneck
Machine Learning

[2602.21508] WaterVIB: Learning Minimal Sufficient Watermark Representations via Variational Information Bottleneck

The paper introduces WaterVIB, a framework for robust watermarking that utilizes the Variational Information Bottleneck to enhance resili...

arXiv - Machine Learning · 3 min ·
[2602.21467] Geometric Priors for Generalizable World Models via Vector Symbolic Architecture
Machine Learning

[2602.21467] Geometric Priors for Generalizable World Models via Vector Symbolic Architecture

This article presents a novel approach to world modeling in AI using Vector Symbolic Architecture (VSA) to enhance generalization and int...

arXiv - Machine Learning · 4 min ·
[2602.21426] Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators
Machine Learning

[2602.21426] Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators

The paper introduces Proximal-IMH, a novel sampling method for Bayesian inverse problems that enhances the efficiency of the Independent ...

arXiv - Machine Learning · 3 min ·
[2602.21390] Defensive Generation
Machine Learning

[2602.21390] Defensive Generation

The paper 'Defensive Generation' presents a novel approach to creating generative models that are unfalsifiable based on observed data, e...

arXiv - Machine Learning · 3 min ·
[2602.21297] Robust AI Evaluation through Maximal Lotteries
Llms

[2602.21297] Robust AI Evaluation through Maximal Lotteries

The paper proposes a new method for evaluating AI models using robust lotteries, addressing limitations of traditional pairwise compariso...

arXiv - Machine Learning · 3 min ·
[2602.10359] Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT
Llms

[2602.10359] Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT

This study evaluates the performance of foundation models in detecting abdominal trauma, revealing that specificity deficits are influenc...

arXiv - AI · 4 min ·
[2602.09929] Monocular Normal Estimation via Shading Sequence Estimation
Machine Learning

[2602.09929] Monocular Normal Estimation via Shading Sequence Estimation

This paper presents a novel approach to monocular normal estimation by reformulating the problem as shading sequence estimation, enhancin...

arXiv - AI · 4 min ·
[2602.05066] Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks
Ai Safety

[2602.05066] Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

The paper discusses vulnerabilities in AI control protocols, specifically how Agent-as-a-Proxy attacks can bypass existing monitoring def...

arXiv - AI · 3 min ·
[2601.17064] Between Search and Platform: ChatGPT Under the DSA
Llms

[2601.17064] Between Search and Platform: ChatGPT Under the DSA

This article analyzes the classification of ChatGPT under the Digital Services Act (DSA), proposing it as a hybrid of search engine and p...

arXiv - AI · 3 min ·
[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective
Ai Safety

[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

This article explores the gaps in understanding superintelligence misalignment, emphasizing the absence of the human subject and the impl...

arXiv - AI · 4 min ·
[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
Llms

[2511.12033] EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation

The paper presents EARL, an Entropy-Aware Reinforcement Learning framework designed to enhance the reliability of RTL code generation by ...

arXiv - Machine Learning · 4 min ·
[2509.25184] Incentive-Aligned Multi-Source LLM Summaries
Llms

[2509.25184] Incentive-Aligned Multi-Source LLM Summaries

The paper presents an innovative framework called Truthful Text Summarization (TTS) aimed at enhancing the factual accuracy of multi-sour...

arXiv - AI · 3 min ·
[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants
Machine Learning

[2508.08337] Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

This paper argues for a shift in machine learning fairness research to focus on structural injustice through social determinants, rather ...

arXiv - Machine Learning · 4 min ·
[2507.08017] Mechanistic Indicators of Understanding in Large Language Models
Llms

[2507.08017] Mechanistic Indicators of Understanding in Large Language Models

This paper explores mechanistic indicators of understanding in large language models (LLMs), proposing a tiered framework to assess their...

arXiv - AI · 4 min ·
[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series
Machine Learning

[2507.14206] A Comprehensive Benchmark for Electrocardiogram Time-Series

This article presents a comprehensive benchmark for electrocardiogram (ECG) time-series analysis, highlighting its unique characteristics...

arXiv - Machine Learning · 4 min ·
[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack
Machine Learning

[2507.02376] On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack

This paper introduces a novel attack and auditing framework for Vertical Federated Learning (VFL), addressing vulnerabilities in inferenc...

arXiv - AI · 4 min ·
Previous Page 44 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime