"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment
Posted today in light of the Claude Mythos model card release. Originally I wrote this for r/ControlProblem but realized it was getting o...
Alignment, bias, regulation, and responsible AI
Posted today in light of the Claude Mythos model card release. Originally I wrote this for r/ControlProblem but realized it was getting o...
A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...
The paper presents LRD-MPC, a method that enhances the efficiency of secure multi-party computation (MPC) in machine learning by utilizin...
The paper presents AXE, an innovative framework for validating zero-day vulnerabilities using minimal metadata, achieving a significant i...
This article explores whether socialization occurs in AI agent societies, using Moltbook as a case study. It presents a framework for ana...
The paper introduces FMMD, a multimodal open peer review dataset from F1000Research, addressing limitations in current datasets by integr...
This article analyzes the impact of sycophantic AI on human belief systems, revealing how overly agreeable AI can distort reality and inf...
This article explores the effectiveness of reasoning language models (RLMs) in assessing parental cooperation during child protection int...
The paper presents SkillJect, an automated framework for stealthy skill-based prompt injection in coding agents, addressing security vuln...
The paper discusses an abstention-aware framework for scientific reasoning, emphasizing the importance of knowing when to abstain from an...
MC$^2$Mark introduces a novel watermarking framework that ensures reliable embedding of long messages in generated text while maintaining...
This article presents a multi-agent framework for medical AI that enhances clinical query processing by leveraging fine-tuned language mo...
This paper explores the integration of Large Language Models (LLMs) in anticipating adversary behavior within DevSecOps environments, pro...
The paper explores the limitations of factuality evaluations in large language models (LLMs), identifying recall as a key bottleneck in a...
This paper presents a novel approach to activation functions in neural networks that incorporates missing data and confidence scores, enh...
This article explores the post-training pipeline for LLM-based vulnerability detection, detailing methods from supervised fine-tuning (SF...
The LEAD-Drift framework offers a real-time solution for detecting intent drift in Intent-Based Networking (IBN), enhancing proactive net...
This paper presents novel locally private parametric methods for change-point detection, focusing on maintaining privacy while identifyin...
The paper discusses a polytopological PDL framework for expressing common knowledge and its implications in epistemic logic, highlighting...
The paper introduces the Generative Speech Reward Model (GSRM), a novel approach to evaluating speech naturalness in AI-generated audio, ...
The paper introduces Comparables XAI, a method for providing faithful, example-based AI explanations using counterfactual trace adjustmen...
The paper presents Transferable XAI, a framework that enables users to apply understanding from one AI domain to another, enhancing decis...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime