AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

Newsom signs executive order requiring AI companies to have safety, privacy guardrails

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report
Ai Safety

[2511.16417] Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

Abstract page for arXiv paper 2511.16417: Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling...

arXiv - AI · 4 min ·
[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
Llms

[2510.08847] What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

Abstract page for arXiv paper 2510.08847: What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

arXiv - AI · 4 min ·

All Content

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints
Machine Learning

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints

This article evaluates transfer learning models for IoT DDoS detection, focusing on explainability and resource constraints. It analyzes ...

arXiv - AI · 3 min ·
[2602.22481] Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs
Llms

[2602.22481] Sydney Telling Fables on AI and Humans: A Corpus Tracing Memetic Transfer of Persona between LLMs

This article explores the relationship between AI and humans through the lens of large language models (LLMs), focusing on the Sydney per...

arXiv - AI · 4 min ·
[2602.22450] Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace
Llms

[2602.22450] Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace

The paper discusses the security risks posed by implicit prompt injection in large language model (LLM) agents, demonstrating how adversa...

arXiv - AI · 4 min ·
[2602.22449] A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection
Nlp

[2602.22449] A Fusion of context-aware based BanglaBERT and Two-Layer Stacked LSTM Framework for Multi-Label Cyberbullying Detection

This paper presents a novel framework combining BanglaBERT and a two-layer stacked LSTM for effective multi-label cyberbullying detection...

arXiv - Machine Learning · 4 min ·
[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems
Llms

[2602.22427] HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems

The paper presents HubScan, a tool designed to detect hubness poisoning in Retrieval-Augmented Generation systems, addressing a critical ...

arXiv - AI · 4 min ·
[2602.22282] Differentially Private Truncation of Unbounded Data via Public Second Moments
Nlp

[2602.22282] Differentially Private Truncation of Unbounded Data via Public Second Moments

This paper presents a novel approach to differentially private data truncation using public second moments, enhancing privacy without com...

arXiv - Machine Learning · 4 min ·
[2602.22246] Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models
Llms

[2602.22246] Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models

This article presents a framework called DiSP (Diffusion Self-Purification) to mitigate backdoor attacks in Multimodal Diffusion Language...

arXiv - Machine Learning · 4 min ·
[2602.22347] Enabling clinical use of foundation models in histopathology
Llms

[2602.22347] Enabling clinical use of foundation models in histopathology

This article discusses the application of foundation models in histopathology, highlighting a novel approach that improves robustness and...

arXiv - AI · 4 min ·
[2602.22236] CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction
Llms

[2602.22236] CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction

The article presents CrossLLM-Mamba, a novel framework for RNA interaction prediction that utilizes multimodal state space fusion of larg...

arXiv - Machine Learning · 4 min ·
[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport
Llms

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

The paper introduces SOTAlign, a semi-supervised framework for aligning unimodal vision and language models using minimal paired data and...

arXiv - AI · 4 min ·
[2602.22258] Poisoned Acoustics
Machine Learning

[2602.22258] Poisoned Acoustics

The paper 'Poisoned Acoustics' explores training-data poisoning attacks on deep neural networks, demonstrating significant vulnerabilitie...

arXiv - AI · 3 min ·
[2602.23336] Differentiable Zero-One Loss via Hypersimplex Projections
Machine Learning

[2602.23336] Differentiable Zero-One Loss via Hypersimplex Projections

This paper presents a novel differentiable approximation to the zero-one loss, enhancing gradient-based optimization in machine learning ...

arXiv - Machine Learning · 3 min ·
[2602.23296] Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity
Machine Learning

[2602.23296] Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

This article presents FedWQ-CP, a novel approach to federated uncertainty quantification that addresses dual heterogeneity in data and mo...

arXiv - Machine Learning · 4 min ·
[2602.22242] Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
Llms

[2602.22242] Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

This paper analyzes the vulnerabilities of Large Language Models (LLMs) to prompt injection and jailbreak attacks, evaluating various def...

arXiv - AI · 3 min ·
[2602.22238] TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI
Machine Learning

[2602.22238] TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI

The paper presents TT-SEAL, a selective encryption framework designed for Tensor-Train Decomposed (TTD) networks, enhancing security and ...

arXiv - AI · 3 min ·
[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling
Machine Learning

[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling

This article presents a novel approach for unsupervised denoising of diffusion-weighted images (dMRI) by addressing noise bias and varian...

arXiv - AI · 4 min ·
[2602.22221] Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews
Llms

[2602.22221] Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews

This article evaluates misinformation exposure on the Chinese web by comparing traditional search engines, LLMs, and AI-generated overvie...

arXiv - AI · 3 min ·
[2602.23164] MetaOthello: A Controlled Study of Multiple World Models in Transformers
Llms

[2602.23164] MetaOthello: A Controlled Study of Multiple World Models in Transformers

The paper presents MetaOthello, a study exploring how transformers manage multiple world models through a controlled suite of Othello var...

arXiv - Machine Learning · 4 min ·
[2602.23128] Bound to Disagree : Generalization Bounds via Certifiable Surrogates
Machine Learning

[2602.23128] Bound to Disagree : Generalization Bounds via Certifiable Surrogates

The paper presents new disagreement-based certificates for generalization bounds in deep learning models, addressing limitations of exist...

arXiv - Machine Learning · 3 min ·
[2602.23116] Regularized Online RLHF with Generalized Bilinear Preferences
Machine Learning

[2602.23116] Regularized Online RLHF with Generalized Bilinear Preferences

This paper explores contextual online Reinforcement Learning with Human Feedback (RLHF) using a Generalized Bilinear Preference Model to ...

arXiv - Machine Learning · 3 min ·
Previous Page 37 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime