AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Ai Safety

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

Abstract page for arXiv paper 2511.16417: Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling...

arXiv - AI · 4 min · about 3 hours ago

Llms

[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

Abstract page for arXiv paper 2510.08847: What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

arXiv - AI · 4 min · about 3 hours ago

All Content

Machine Learning

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints

This article evaluates transfer learning models for IoT DDoS detection, focusing on explainability and resource constraints. It analyzes ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22481] Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs

This article explores the relationship between AI and humans through the lens of large language models (LLMs), focusing on the Sydney per...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22450] Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

The paper discusses the security risks posed by implicit prompt injection in large language model (LLM) agents, demonstrating how adversa...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.22449] A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

This paper presents a novel framework combining BanglaBERT and a two-layer stacked LSTM for effective multi-label cyberbullying detection...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

The paper presents HubScan, a tool designed to detect hubness poisoning in Retrieval-Augmented Generation systems, addressing a critical ...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.22282] Differentially Private Truncation of Unbounded Data via Public Second Moments

This paper presents a novel approach to differentially private data truncation using public second moments, enhancing privacy without com...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22246] Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models

This article presents a framework called DiSP (Diffusion Self-Purification) to mitigate backdoor attacks in Multimodal Diffusion Language...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22347] Enabling clinical use of foundation models in histopathology

This article discusses the application of foundation models in histopathology, highlighting a novel approach that improves robustness and...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22236] CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction

The article presents CrossLLM-Mamba, a novel framework for RNA interaction prediction that utilizes multimodal state space fusion of larg...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

The paper introduces SOTAlign, a semi-supervised framework for aligning unimodal vision and language models using minimal paired data and...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22258] Poisoned Acoustics

The paper 'Poisoned Acoustics' explores training-data poisoning attacks on deep neural networks, demonstrating significant vulnerabilitie...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.23336] Differentiable Zero-One Loss via Hypersimplex Projections

This paper presents a novel differentiable approximation to the zero-one loss, enhancing gradient-based optimization in machine learning ...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.23296] Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

This article presents FedWQ-CP, a novel approach to federated uncertainty quantification that addresses dual heterogeneity in data and mo...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22242] Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

This paper analyzes the vulnerabilities of Large Language Models (LLMs) to prompt injection and jailbreak attacks, evaluating various def...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22238] TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI

The paper presents TT-SEAL, a selective encryption framework designed for Tensor-Train Decomposed (TTD) networks, enhancing security and ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling

This article presents a novel approach for unsupervised denoising of diffusion-weighted images (dMRI) by addressing noise bias and varian...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22221] Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews

This article evaluates misinformation exposure on the Chinese web by comparing traditional search engines, LLMs, and AI-generated overvie...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.23164] MetaOthello: A Controlled Study of Multiple World Models in Transformers

The paper presents MetaOthello, a study exploring how transformers manage multiple world models through a controlled suite of Othello var...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.23128] Bound to Disagree : Generalization Bounds via Certifiable Surrogates

The paper presents new disagreement-based certificates for generalization bounds in deep learning models, addressing limitations of exist...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.23116] Regularized Online RLHF with Generalized Bilinear Preferences

This paper explores contextual online Reinforcement Learning with Human Feedback (RLHF) using a Generalized Bilinear Preference Model to ...

arXiv - Machine Learning · 3 min · about 1 month ago

Previous Page 37 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

All Content

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints

[2602.22481] Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs

[2602.22450] Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

[2602.22449] A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

[2602.22282] Differentially Private Truncation of Unbounded Data via Public Second Moments

[2602.22246] Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models

[2602.22347] Enabling clinical use of foundation models in histopathology

[2602.22236] CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

[2602.22258] Poisoned Acoustics

[2602.23336] Differentiable Zero-One Loss via Hypersimplex Projections

[2602.23296] Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

[2602.22242] Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

[2602.22238] TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI

[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling

[2602.22221] Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews

[2602.23164] MetaOthello: A Controlled Study of Multiple World Models in Transformers

[2602.23128] Bound to Disagree : Generalization Bounds via Certifiable Surrogates

[2602.23116] Regularized Online RLHF with Generalized Bilinear Preferences

Related Topics

Stay updated with AI News