AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI · 4 min · about 17 hours ago

Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min · about 17 hours ago

Machine Learning

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Abstract page for arXiv paper 2507.22264: SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

arXiv - AI · 4 min · about 17 hours ago

All Content

Machine Learning

[2602.11897] Agentic AI for Cybersecurity: A Meta-Cognitive Architecture for Governable Autonomy

This paper presents a novel meta-cognitive architecture for AI in cybersecurity, advocating for a shift from traditional model-centric sy...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.03100] Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs

The paper presents TGIF, a novel approach to mitigate hallucinations in multimodal large language models (MLLMs) by leveraging a text-gui...

arXiv - AI · 4 min · about 2 months ago

Robotics

[2511.14624] Active Matter as a framework for living systems-inspired Robophysics

This article explores the intersection of active matter physics and robotics, focusing on the challenges faced by bio-inspired robotic sy...

arXiv - AI · 3 min · about 2 months ago

Llms

[2510.22389] Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?

This article evaluates the ability of small and reasoning large language models (LLMs) to assess journal article quality, revealing that ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.10509] MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

The paper presents MARS-Sep, a novel reinforcement learning framework for sound separation that enhances semantic consistency by aligning...

arXiv - AI · 3 min · about 2 months ago

Llms

[2504.06438] Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning

The paper presents a novel framework for premise verification in large language models (LLMs) to reduce hallucinations by using retrieval...

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.02766] The Generative Reasonable Person

The article introduces the 'generative reasonable person,' a tool for assessing how ordinary people judge reasonableness in various legal...

arXiv - AI · 4 min · about 2 months ago

Llms

[2502.18545] PII-Bench: Evaluating Query-Aware Privacy Protection Systems

The paper introduces PII-Bench, a novel framework for evaluating privacy protection systems in Large Language Models (LLMs), highlighting...

arXiv - AI · 3 min · about 2 months ago

Llms

[2407.03646] Differentiating Between Human-Written and AI-Generated Texts Using Automatically Extracted Linguistic Features

This article explores the differences between human-written and AI-generated texts by analyzing linguistic features, revealing significan...

arXiv - AI · 4 min · about 2 months ago

Ai Infrastructure

[2309.08615] Energy Concerns with HPC Systems and Applications

The paper discusses the critical energy concerns associated with High-Performance Computing (HPC) systems and applications, emphasizing t...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.12249] "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most

This paper examines the shortcomings of speech recognition models in accurately transcribing high-stakes utterances, particularly U.S. st...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.08968] stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

The paper introduces stable-worldmodel (SWM), a modular ecosystem for world modeling research that enhances reproducibility and standardi...

arXiv - AI · 3 min · about 2 months ago

Llms

[2601.15812] ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models

This article introduces ErrorMap and ErrorAtlas, innovative tools designed to analyze and categorize the failure patterns of large langua...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2510.18631] Comparative Expressivity for Structured Argumentation Frameworks with Uncertain Rules and Premises

This paper explores the expressivity of structured argumentation frameworks that incorporate uncertainty, presenting both theoretical and...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2508.00576] MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models

MultiSHAP introduces a Shapley-based framework for explaining interactions in multimodal AI models, enhancing interpretability and trustw...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.11325] Amortised and provably-robust simulation-based inference

This paper presents a novel method for simulation-based inference that is robust to outliers and simplifies computation by eliminating th...

arXiv - Machine Learning · 3 min · about 2 months ago

Ai Infrastructure

[2507.06134] OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

OpenAgentSafety introduces a modular framework for evaluating AI agent safety in real-world tasks, addressing critical vulnerabilities in...

arXiv - AI · 4 min · about 2 months ago

Llms

[2506.02649] From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV

This article explores the integration of Large Language Models (LLMs) with Uncrewed Aerial Vehicles (UAVs) for enhanced public safety, fo...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2601.18608] PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression

The paper introduces PolySHAP, an extension of KernelSHAP that uses interaction-informed polynomial regression to improve the accuracy of...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.15811] Task-Agnostic Continual Learning for Chest Radiograph Classification

This article presents CARL-XRay, a novel continual learning framework for chest radiograph classification that adapts to new datasets wit...

arXiv - AI · 4 min · about 2 months ago

Previous Page 91 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

[2507.22264] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

All Content

[2602.11897] Agentic AI for Cybersecurity: A Meta-Cognitive Architecture for Governable Autonomy

[2601.03100] Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs

[2511.14624] Active Matter as a framework for living systems-inspired Robophysics

[2510.22389] Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?

[2510.10509] MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

[2504.06438] Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning

[2508.02766] The Generative Reasonable Person

[2502.18545] PII-Bench: Evaluating Query-Aware Privacy Protection Systems

[2407.03646] Differentiating Between Human-Written and AI-Generated Texts Using Automatically Extracted Linguistic Features

[2309.08615] Energy Concerns with HPC Systems and Applications

[2602.12249] "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most

[2602.08968] stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

[2601.15812] ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models

[2510.18631] Comparative Expressivity for Structured Argumentation Frameworks with Uncertain Rules and Premises

[2508.00576] MultiSHAP: A Shapley-Based Framework for Explaining Cross-Modal Interactions in Multimodal AI Models

[2602.11325] Amortised and provably-robust simulation-based inference

[2507.06134] OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

[2506.02649] From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV

[2601.18608] PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression

[2602.15811] Task-Agnostic Continual Learning for Chest Radiograph Classification

Related Topics

Stay updated with AI News