AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

Top This Week

Ai Safety

NHS staff resist using Palantir software. Staff reportedly cite ethics concerns, privacy worries, and doubt the platform adds much

submitted by /u/esporx [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

House Democrat Questions Anthropic on AI Safety After Source Code Leak

Rep. Josh Gottheimer, who is generally tough on China, just sent a letter to Anthropic questioning their decision to reduce certain safet...

Reddit - Artificial Intelligence · 1 min ·

All Content

Sam Altman defends AI resource usage: Water concerns 'fake,' and 'humans use energy too'
Ai Infrastructure

Sam Altman defends AI resource usage: Water concerns 'fake,' and 'humans use energy too'

Sam Altman, CEO of OpenAI, defends AI's resource usage, dismissing water consumption concerns as unfounded and comparing AI energy use to...

AI Tools & Products · 4 min ·
Llms

[R] I forced an LLM to design a Zero-Hallucination architecture

The article explores an experiment where an LLM was tasked with designing a Zero-Hallucination architecture, focusing on internal problem...

Reddit - Machine Learning · 1 min ·
Generative Ai

Gradient Descent into Hell

The article discusses the implications of AI's self-assessment capabilities and the potential risks associated with its development, part...

Reddit - Artificial Intelligence · 1 min ·
Sam Altman defends AI's resource consumption and ridicules Musk's plan to put data centers in space
Ai Infrastructure

Sam Altman defends AI's resource consumption and ridicules Musk's plan to put data centers in space

Sam Altman addresses concerns over AI's resource consumption, arguing it is comparable to human energy use, while dismissing Musk's space...

AI Tools & Products · 6 min ·
[2510.00463] On the Adversarial Robustness of Learning-based Conformal Novelty Detection
Machine Learning

[2510.00463] On the Adversarial Robustness of Learning-based Conformal Novelty Detection

This paper investigates the adversarial robustness of learning-based conformal novelty detection methods, revealing significant vulnerabi...

arXiv - Machine Learning · 4 min ·
[2504.21035] A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
Data Science

[2504.21035] A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

This article evaluates the effectiveness of textual data sanitization methods, revealing that current techniques may provide a false sens...

arXiv - Machine Learning · 4 min ·
[2503.07313] The influence of missing data mechanisms and simple missing data handling techniques on fairness
Machine Learning

[2503.07313] The influence of missing data mechanisms and simple missing data handling techniques on fairness

This article explores how different missing data mechanisms and handling techniques affect the fairness of machine learning algorithms, r...

arXiv - Machine Learning · 4 min ·
[2312.12715] Learning Performance Maximizing Ensembles with Explainability Guarantees
Machine Learning

[2312.12715] Learning Performance Maximizing Ensembles with Explainability Guarantees

This paper presents a method for optimizing the allocation of observations between explainable and black box models, aiming to maximize e...

arXiv - Machine Learning · 3 min ·
[2601.10160] Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
Llms

[2601.10160] Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

The paper explores how AI discourse influences the alignment of large language models (LLMs), revealing that negative narratives can lead...

arXiv - Machine Learning · 3 min ·
[2601.20198] DeRaDiff: Denoising Time Realignment of Diffusion Models
Machine Learning

[2601.20198] DeRaDiff: Denoising Time Realignment of Diffusion Models

The paper presents DeRaDiff, a novel method for denoising time realignment in diffusion models, enabling efficient adjustment of regulari...

arXiv - Machine Learning · 4 min ·
[2512.07805] Group Representational Position Encoding
Ai Safety

[2512.07805] Group Representational Position Encoding

The paper introduces GRAPE (Group Representational Position Encoding), a framework for positional encoding that integrates multiplicative...

arXiv - Machine Learning · 4 min ·
[2510.13887] Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion
Ai Safety

[2510.13887] Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

This paper presents a novel framework for incomplete multi-view clustering using Hierarchical Semantic Alignment and Cooperative Completi...

arXiv - Machine Learning · 4 min ·
[2511.18721] Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
Llms

[2511.18721] Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

This paper introduces a probabilistic framework for certifying defenses against jailbreaking attacks on LLMs, addressing limitations of t...

arXiv - Machine Learning · 3 min ·
[2510.18322] Uncertainty Estimation by Flexible Evidential Deep Learning
Machine Learning

[2510.18322] Uncertainty Estimation by Flexible Evidential Deep Learning

This paper introduces Flexible Evidential Deep Learning (F-EDL), enhancing uncertainty quantification in machine learning by extending th...

arXiv - Machine Learning · 3 min ·
[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks
Llms

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

ViGText introduces a novel approach to deepfake detection by integrating Vision-Language Model explanations with Graph Neural Networks, e...

arXiv - Machine Learning · 4 min ·
[2509.23592] Toward a Holistic Approach to Continual Model Merging
Machine Learning

[2509.23592] Toward a Holistic Approach to Continual Model Merging

The paper presents a holistic framework for Continual Model Merging (CMM) that addresses scalability and performance issues in continual ...

arXiv - Machine Learning · 4 min ·
[2507.10587] Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing
Llms

[2507.10587] Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing

The paper discusses the concept of anthropomimetic uncertainty in language models, emphasizing the need for these models to express confi...

arXiv - AI · 4 min ·
[2505.17064] Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
Machine Learning

[2505.17064] Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

This article evaluates how Text-to-Image diffusion models represent historical contexts, introducing a benchmark to assess their accuracy...

arXiv - Machine Learning · 4 min ·
[2504.21022] ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees
Nlp

[2504.21022] ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees

The paper presents ConformalNL2LTL, a novel method for translating natural language instructions into Linear Temporal Logic (LTL) formula...

arXiv - Machine Learning · 4 min ·
[2508.11936] M3OOD: Automatic Selection of Multimodal OOD Detectors
Machine Learning

[2508.11936] M3OOD: Automatic Selection of Multimodal OOD Detectors

The paper presents M3OOD, a meta-learning framework designed for the automatic selection of out-of-distribution (OOD) detectors in multim...

arXiv - Machine Learning · 4 min ·
Previous Page 70 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime