AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

This Is Not Hacking. This Is Structured Intelligence.

Watch me demonstrate everything I've been talking about—live, in real time. The Setup: Maestro University AI enrollment system Standard c...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Ai Safety

When Agentic AI Browsers Outrun Governance

Agentic AI browsers introduce new enterprise risk. Learn how AI governance helps leaders assess exposure, oversight gaps, and safe adopti...

AI Tools & Products · 14 min · about 4 hours ago

Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

All Content

Llms

[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA

The paper presents PSQE, a method for enhancing pseudo seed quality in unsupervised multimodal entity alignment, addressing challenges in...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.22621] CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

The paper presents CGSA, a novel framework for Source-Free Domain Adaptive Object Detection that integrates object-centric learning to en...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22699] DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

DPSQL+ is a new SQL library designed to enhance data privacy by enforcing differential privacy and a minimum frequency rule, ensuring sen...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22631] TorchLean: Formalizing Neural Networks in Lean

TorchLean is a framework that formalizes neural networks within the Lean 4 theorem prover, enabling precise semantics for execution and v...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22564] Addressing Climate Action Misperceptions with Generative AI

This study explores how a personalized large language model (LLM) can correct climate action misperceptions among climate-concerned indiv...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22609] EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

EvolveGen introduces a novel framework for generating hardware model checking benchmarks using reinforcement learning, addressing the ben...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints

This article evaluates transfer learning models for IoT DDoS detection, focusing on explainability and resource constraints. It analyzes ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22481] Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs

This article explores the relationship between AI and humans through the lens of large language models (LLMs), focusing on the Sydney per...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22450] Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

The paper discusses the security risks posed by implicit prompt injection in large language model (LLM) agents, demonstrating how adversa...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.22449] A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

This paper presents a novel framework combining BanglaBERT and a two-layer stacked LSTM for effective multi-label cyberbullying detection...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

The paper presents HubScan, a tool designed to detect hubness poisoning in Retrieval-Augmented Generation systems, addressing a critical ...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.22282] Differentially Private Truncation of Unbounded Data via Public Second Moments

This paper presents a novel approach to differentially private data truncation using public second moments, enhancing privacy without com...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22246] Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models

This article presents a framework called DiSP (Diffusion Self-Purification) to mitigate backdoor attacks in Multimodal Diffusion Language...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22347] Enabling clinical use of foundation models in histopathology

This article discusses the application of foundation models in histopathology, highlighting a novel approach that improves robustness and...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22236] CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction

The article presents CrossLLM-Mamba, a novel framework for RNA interaction prediction that utilizes multimodal state space fusion of larg...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

The paper introduces SOTAlign, a semi-supervised framework for aligning unimodal vision and language models using minimal paired data and...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22258] Poisoned Acoustics

The paper 'Poisoned Acoustics' explores training-data poisoning attacks on deep neural networks, demonstrating significant vulnerabilitie...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.23336] Differentiable Zero-One Loss via Hypersimplex Projections

This paper presents a novel differentiable approximation to the zero-one loss, enhancing gradient-based optimization in machine learning ...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.23296] Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

This article presents FedWQ-CP, a novel approach to federated uncertainty quantification that addresses dual heterogeneity in data and mo...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 34 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

This Is Not Hacking. This Is Structured Intelligence.

When Agentic AI Browsers Outrun Governance

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

All Content

[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA

[2602.22621] CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

[2602.22699] DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

[2602.22631] TorchLean: Formalizing Neural Networks in Lean

[2602.22564] Addressing Climate Action Misperceptions with Generative AI

[2602.22609] EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints

[2602.22481] Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs

[2602.22450] Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

[2602.22449] A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

[2602.22282] Differentially Private Truncation of Unbounded Data via Public Second Moments

[2602.22246] Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models

[2602.22347] Enabling clinical use of foundation models in histopathology

[2602.22236] CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

[2602.22258] Poisoned Acoustics

[2602.23336] Differentiable Zero-One Loss via Hypersimplex Projections

[2602.23296] Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

Related Topics

Stay updated with AI News