AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min · about 8 hours ago

Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min · about 8 hours ago

Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min · about 8 hours ago

All Content

Llms

[2509.24803] TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

The paper introduces TimeOmni-1, a model designed to enhance complex reasoning with time series data in large language models, addressing...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2502.01160] Scalable Precise Computation of Shannon Entropy

This paper presents a scalable tool, PSE, for precise computation of Shannon entropy, optimizing the process to enhance efficiency in qua...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2411.06624] A Review of Fairness and A Practical Guide to Selecting Context-Appropriate Fairness Metrics in Machine Learning

This article reviews fairness in machine learning, emphasizing the need for context-appropriate fairness metrics and providing a flowchar...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16708] Policy Compiler for Secure Agentic Systems

The article presents PCAS, a Policy Compiler designed to enforce complex authorization policies in LLM-based agents, improving compliance...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16703] Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology

This study evaluates the impact of large language models (LLMs) on novice performance in biology laboratory tasks, revealing modest benef...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16660] Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

The paper presents a method for enhancing multilingual safety alignment in large language models (LLMs) using a resource-efficient Multi-...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.16601] Error Propagation and Model Collapse in Diffusion Models: A Theoretical Study

This theoretical study examines error propagation and model collapse in diffusion models, highlighting how recursive training on syntheti...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.16610] Who can we trust? LLM-as-a-jury for Comparative Assessment

The paper explores the reliability of large language models (LLMs) as evaluators in natural language generation tasks, proposing a new mo...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.16608] Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

The paper presents the Context-Aware Layer-Wise Integrated Gradients (CA-LIG) framework, enhancing explainability in Transformer models b...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.16520] Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents

The paper presents RLM-JB, a framework utilizing Recursive Language Models for detecting jailbreak prompts in large language models, enha...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.16343] How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

This article explores the dual role of neural audio codecs in labeling resynthesized audio for deepfake detection, highlighting the impac...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.16346] Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

This article presents STING, a framework for evaluating illicit assistance in multi-turn, multilingual LLM agents, highlighting the chall...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.16467] IndicEval: A Bilingual Indian Educational Evaluation Framework for Large Language Models

IndicEval introduces a bilingual evaluation framework for large language models, assessing their performance on real examination question...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.16444] RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

RoboGene introduces a framework for automating the generation of diverse, physically plausible robotic manipulation tasks, addressing the...

arXiv - AI · 4 min · about 2 months ago

Nlp

[2602.16144] Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis

The paper presents Missing-by-Design (MBD), a framework for revocable multimodal sentiment analysis that enhances privacy compliance by a...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.16309] The Weight of a Bit: EMFI Sensitivity Analysis of Embedded Deep Learning Models

This article investigates the impact of different number representations on the vulnerability of embedded deep learning models to electro...

arXiv - AI · 3 min · about 2 months ago

Generative Ai

[2602.16307] Generative AI Usage of University Students: Navigating Between Education and Business

This study explores the use of generative AI by university students balancing education and work, highlighting its benefits and challenges.

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.16241] Are LLMs Ready to Replace Bangla Annotators?

This article evaluates the effectiveness of Large Language Models (LLMs) as annotators for Bangla hate speech, revealing significant bias...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.16098] Collaborative Zone-Adaptive Zero-Day Intrusion Detection for IoBT

The paper presents a novel Zone-Adaptive Intrusion Detection framework for the Internet of Battlefield Things (IoBT), addressing the chal...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.16090] Examining Fast Radiative Feedbacks Using Machine-Learning Weather Emulators

This article explores the use of machine-learning weather emulators to analyze fast radiative feedbacks in the climate system, focusing o...

arXiv - Machine Learning · 4 min · about 2 months ago

Previous Page 86 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

All Content

[2509.24803] TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

[2502.01160] Scalable Precise Computation of Shannon Entropy

[2411.06624] A Review of Fairness and A Practical Guide to Selecting Context-Appropriate Fairness Metrics in Machine Learning

[2602.16708] Policy Compiler for Secure Agentic Systems

[2602.16703] Measuring Mid-2025 LLM-Assistance on Novice Performance in Biology

[2602.16660] Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

[2602.16601] Error Propagation and Model Collapse in Diffusion Models: A Theoretical Study

[2602.16610] Who can we trust? LLM-as-a-jury for Comparative Assessment

[2602.16608] Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

[2602.16520] Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents

[2602.16343] How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection

[2602.16346] Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

[2602.16467] IndicEval: A Bilingual Indian Educational Evaluation Framework for Large Language Models

[2602.16444] RoboGene: Boosting VLA Pre-training via Diversity-Driven Agentic Framework for Real-World Task Generation

[2602.16144] Missing-by-Design: Certifiable Modality Deletion for Revocable Multimodal Sentiment Analysis

[2602.16309] The Weight of a Bit: EMFI Sensitivity Analysis of Embedded Deep Learning Models

[2602.16307] Generative AI Usage of University Students: Navigating Between Education and Business

[2602.16241] Are LLMs Ready to Replace Bangla Annotators?

[2602.16098] Collaborative Zone-Adaptive Zero-Day Intrusion Detection for IoBT

[2602.16090] Examining Fast Radiative Feedbacks Using Machine-Learning Weather Emulators

Related Topics

Stay updated with AI News