AI Safety & Ethics

Alignment, bias, regulation, and responsible AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment

Posted today in light of the Claude Mythos model card release. Originally I wrote this for r/ControlProblem but realized it was getting o...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Ai Safety

Conversations with Women in STEAM: The Ethics of AI with Dr. Nita Farahany

AI Tools & Products · about 2 hours ago

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

All Content

Machine Learning

[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

The paper presents LRD-MPC, a method that enhances the efficiency of secure multi-party computation (MPC) in machine learning by utilizin...

arXiv - Machine Learning · 4 min · about 2 months ago

Nlp

[2602.14345] AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports

The paper presents AXE, an innovative framework for validating zero-day vulnerabilities using minimal metadata, achieving a significant i...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14299] Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

This article explores whether socialization occurs in AI agent societies, using Moltbook as a case study. It presents a framework for ana...

arXiv - AI · 4 min · about 2 months ago

Data Science

[2602.14285] FMMD: A multimodal open peer review dataset based on F1000Research

The paper introduces FMMD, a multimodal open peer review dataset from F1000Research, addressing limitations in current datasets by integr...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14270] A Rational Analysis of the Effects of Sycophantic AI

This article analyzes the impact of sycophantic AI on human belief systems, revealing how overly agreeable AI can distort reality and inf...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14216] Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports

This article explores the effectiveness of reasoning language models (RLMs) in assessing parental cooperation during child protection int...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.14211] SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

The paper presents SkillJect, an automated framework for stealthy skill-based prompt injection in coding agents, addressing security vuln...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14189] Knowing When Not to Answer: Abstention-Aware Scientific Reasoning

The paper discusses an abstention-aware framework for scientific reasoning, emphasizing the importance of knowing when to abstain from an...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14030] MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages

MC$^2$Mark introduces a novel watermarking framework that ensures reliable embedding of long messages in generated text while maintaining...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.14158] A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing

This article presents a multi-agent framework for medical AI that enhances clinical query processing by leveraging fine-tuned language mo...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14106] Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

This paper explores the integration of Large Language Models (LLMs) in anticipating adversary behavior within DevSecOps environments, pro...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14080] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

The paper explores the limitations of factuality evaluations in large language models (LLMs), identifying recall as a key bottleneck in a...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13864] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation

This paper presents a novel approach to activation functions in neural networks that incorporates missing data and confidence scores, enh...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.14012] From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection

This article explores the post-training pipeline for LLM-based vulnerability detection, detailing methods from supervised fine-tuning (SF...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13672] LEAD-Drift: Real-time and Explainable Intent Drift Detection by Learning a Data-Driven Risk Score

The LEAD-Drift framework offers a real-time solution for detecting intent drift in Intent-Based Networking (IBN), enhancing proactive net...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Startups

[2602.13619] Locally Private Parametric Methods for Change-Point Detection

This paper presents novel locally private parametric methods for change-point detection, focusing on maintaining privacy while identifyin...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2602.13914] Common Knowledge Always, Forever

The paper discusses a polytopological PDL framework for expressing common knowledge and its implications in epistemic logic, highlighting...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13891] GSRM: Generative Speech Reward Model for Speech RLHF

The paper introduces the Generative Speech Reward Model (GSRM), a novel approach to evaluating speech naturalness in AI-generated audio, ...

arXiv - AI · 4 min · about 2 months ago

Ai Startups

[2602.13784] Comparables XAI: Faithful Example-based AI Explanations with Counterfactual Trace Adjustments

The paper introduces Comparables XAI, a method for providing faithful, example-based AI explanations using counterfactual trace adjustmen...

arXiv - AI · 3 min · about 2 months ago

Ai Safety

[2602.13675] Transferable XAI: Relating Understanding Across Domains with Explanation Transfer

The paper presents Transferable XAI, a framework that enables users to apply understanding from one AI domain to another, enhancing decis...

arXiv - AI · 4 min · about 2 months ago

Previous Page 106 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Safety & Ethics

Top This Week

"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment

Conversations with Women in STEAM: The Ethics of AI with Dr. Nita Farahany

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

All Content

[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

[2602.14345] AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports

[2602.14299] Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook

[2602.14285] FMMD: A multimodal open peer review dataset based on F1000Research

[2602.14270] A Rational Analysis of the Effects of Sycophantic AI

[2602.14216] Reasoning Language Models for complex assessments tasks: Evaluating parental cooperation from child protection case reports

[2602.14211] SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

[2602.14189] Knowing When Not to Answer: Abstention-Aware Scientific Reasoning

[2602.14030] MC$^2$Mark: Distortion-Free Multi-Bit Watermarking for Long Messages

[2602.14158] A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing

[2602.14106] Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

[2602.14080] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

[2602.13864] Evolving Multi-Channel Confidence-Aware Activation Functions for Missing Data with Channel Propagation

[2602.14012] From SFT to RL: Demystifying the Post-Training Pipeline for LLM-based Vulnerability Detection

[2602.13672] LEAD-Drift: Real-time and Explainable Intent Drift Detection by Learning a Data-Driven Risk Score

[2602.13619] Locally Private Parametric Methods for Change-Point Detection

[2602.13914] Common Knowledge Always, Forever

[2602.13891] GSRM: Generative Speech Reward Model for Speech RLHF

[2602.13784] Comparables XAI: Faithful Example-based AI Explanations with Counterfactual Trace Adjustments

[2602.13675] Transferable XAI: Relating Understanding Across Domains with Explanation Transfer

Related Topics

Stay updated with AI News