AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min · 1 day ago

Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min · 1 day ago

Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min · 1 day ago

All Content

Ai Infrastructure

Sam Altman defends AI resource usage: Water concerns 'fake,' and 'humans use energy too'

Sam Altman, CEO of OpenAI, defends AI's resource usage, dismissing water consumption concerns as unfounded and comparing AI energy use to...

AI Tools & Products · 4 min · about 1 month ago

Llms

[R] I forced an LLM to design a Zero-Hallucination architecture

The article explores an experiment where an LLM was tasked with designing a Zero-Hallucination architecture, focusing on internal problem...

Reddit - Machine Learning · 1 min · about 1 month ago

Generative Ai

Gradient Descent into Hell

The article discusses the implications of AI's self-assessment capabilities and the potential risks associated with its development, part...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Ai Infrastructure

Sam Altman defends AI's resource consumption and ridicules Musk's plan to put data centers in space

Sam Altman addresses concerns over AI's resource consumption, arguing it is comparable to human energy use, while dismissing Musk's space...

AI Tools & Products · 6 min · about 1 month ago

Machine Learning

[2510.00463] On the Adversarial Robustness of Learning-based Conformal Novelty Detection

This paper investigates the adversarial robustness of learning-based conformal novelty detection methods, revealing significant vulnerabi...

arXiv - Machine Learning · 4 min · about 1 month ago

Data Science

[2504.21035] A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

This article evaluates the effectiveness of textual data sanitization methods, revealing that current techniques may provide a false sens...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2503.07313] The influence of missing data mechanisms and simple missing data handling techniques on fairness

This article explores how different missing data mechanisms and handling techniques affect the fairness of machine learning algorithms, r...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2312.12715] Learning Performance Maximizing Ensembles with Explainability Guarantees

This paper presents a method for optimizing the allocation of observations between explainable and black box models, aiming to maximize e...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2601.10160] Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

The paper explores how AI discourse influences the alignment of large language models (LLMs), revealing that negative narratives can lead...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2601.20198] DeRaDiff: Denoising Time Realignment of Diffusion Models

The paper presents DeRaDiff, a novel method for denoising time realignment in diffusion models, enabling efficient adjustment of regulari...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2512.07805] Group Representational Position Encoding

The paper introduces GRAPE (Group Representational Position Encoding), a framework for positional encoding that integrates multiplicative...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2510.13887] Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

This paper presents a novel framework for incomplete multi-view clustering using Hierarchical Semantic Alignment and Cooperative Completi...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2511.18721] Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

This paper introduces a probabilistic framework for certifying defenses against jailbreaking attacks on LLMs, addressing limitations of t...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2510.18322] Uncertainty Estimation by Flexible Evidential Deep Learning

This paper introduces Flexible Evidential Deep Learning (F-EDL), enhancing uncertainty quantification in machine learning by extending th...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

ViGText introduces a novel approach to deepfake detection by integrating Vision-Language Model explanations with Graph Neural Networks, e...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.23592] Toward a Holistic Approach to Continual Model Merging

The paper presents a holistic framework for Continual Model Merging (CMM) that addresses scalability and performance issues in continual ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2507.10587] Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing

The paper discusses the concept of anthropomimetic uncertainty in language models, emphasizing the need for these models to express confi...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2505.17064] Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

This article evaluates how Text-to-Image diffusion models represent historical contexts, introducing a benchmark to assess their accuracy...

arXiv - Machine Learning · 4 min · about 1 month ago

Nlp

[2504.21022] ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees

The paper presents ConformalNL2LTL, a novel method for translating natural language instructions into Linear Temporal Logic (LTL) formula...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2508.11936] M3OOD: Automatic Selection of Multimodal OOD Detectors

The paper presents M3OOD, a meta-learning framework designed for the automatic selection of out-of-distribution (OOD) detectors in multim...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 70 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

House Democrat Questions Anthropic on AI Safety After Source Code Leak

All Content

Sam Altman defends AI resource usage: Water concerns 'fake,' and 'humans use energy too'

[R] I forced an LLM to design a Zero-Hallucination architecture

Gradient Descent into Hell

Sam Altman defends AI's resource consumption and ridicules Musk's plan to put data centers in space

[2510.00463] On the Adversarial Robustness of Learning-based Conformal Novelty Detection

[2504.21035] A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

[2503.07313] The influence of missing data mechanisms and simple missing data handling techniques on fairness

[2312.12715] Learning Performance Maximizing Ensembles with Explainability Guarantees

[2601.10160] Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

[2601.20198] DeRaDiff: Denoising Time Realignment of Diffusion Models

[2512.07805] Group Representational Position Encoding

[2510.13887] Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

[2511.18721] Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

[2510.18322] Uncertainty Estimation by Flexible Evidential Deep Learning

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

[2509.23592] Toward a Holistic Approach to Continual Model Merging

[2507.10587] Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing

[2505.17064] Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

[2504.21022] ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees

[2508.11936] M3OOD: Automatic Selection of Multimodal OOD Detectors

Related Topics

Stay updated with AI News