AI Safety & Ethics
Alignment, bias, regulation, and responsible AI
Top This Week
The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors
A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...
China drafts law regulating 'digital humans' and banning addictive virtual services for children
A Reuters report outlines China's proposed regulations on the rapidly expanding sector of digital humans and AI avatars. Under the new dr...
All Content
[2502.14560] Less is More: Improving LLM Alignment via Preference Data Selection
This article discusses a novel approach to improving large language model (LLM) alignment through effective preference data selection, en...
[2602.14941] AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
AnchorWeave introduces a novel framework for video generation that enhances spatial consistency over long durations by utilizing multiple...
[2502.02415] Fast Graph Generation via Autoregressive Noisy Filtration Modeling
This paper presents Autoregressive Noisy Filtration Modeling (ANFM), a new framework for fast graph generation that balances quality and ...
[2602.14834] Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision
This paper explores the impact of central fixation bias on evaluating human-like scanpaths in vision models, proposing a new metric to im...
[2602.14783] What hackers talk about when they talk about AI: Early-stage diffusion of a cybercrime innovation
This article explores how cybercriminals are discussing and utilizing artificial intelligence (AI) to enhance their operations, revealing...
[2602.14778] A Geometric Analysis of Small-sized Language Model Hallucinations
This paper explores hallucinations in small-sized language models (LLMs) through a geometric lens, demonstrating that genuine responses c...
[2406.12844] Synergizing Foundation Models and Federated Learning: A Survey
This survey explores the integration of Foundation Models (FMs) and Federated Learning (FL), termed Federated Foundation Models (FedFM), ...
[2602.14760] Residual Connections and the Causal Shift: Uncovering a Structural Misalignment in Transformers
This article explores a structural misalignment in Transformers, particularly regarding residual connections and their impact on next-tok...
[2602.14934] Activation-Space Uncertainty Quantification for Pretrained Networks
The paper presents Gaussian Process Activations (GAPA), a novel method for uncertainty quantification in pretrained networks, enhancing e...
[2602.14606] Towards Selection as Power: Bounding Decision Authority in Autonomous Agents
The paper discusses a governance architecture for autonomous agents, focusing on bounding decision authority to ensure safety in high-sta...
[2602.14862] The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling
The paper explores the properties of temperature scaling in probabilistic models, particularly its impact on classifier calibration and l...
[2602.14777] Emergently Misaligned Language Models Show Behavioral Self-Awareness That Shifts With Subsequent Realignment
This research paper explores how emergently misaligned language models exhibit behavioral self-awareness, revealing shifts in their self-...
[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR
This article presents the BETA-labeling framework for constructing a Bangla IR dataset, addressing challenges in low-resource languages a...
[2602.14689] Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
This article presents a comprehensive study on the vulnerability of open-weight models to prefill attacks, revealing significant security...
[2602.14477] When OpenClaw AI Agents Teach Each Other: Peer Learning Patterns in the Moltbook Community
This paper explores peer learning among AI agents in the Moltbook community, analyzing over 28,000 posts to identify teaching patterns an...
[2602.14374] Differentially Private Retrieval-Augmented Generation
The paper presents DP-KSA, a novel algorithm that integrates differential privacy into retrieval-augmented generation (RAG) systems, addr...
[2602.14471] Socially-Weighted Alignment: A Game-Theoretic Framework for Multi-Agent LLM Systems
The paper presents a game-theoretic framework called Socially-Weighted Alignment (SWA) for managing multi-agent large language model (LLM...
[2602.14364] A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
This article presents a trajectory-based safety audit of Clawdbot, an AI agent, evaluating its performance across various risk dimensions...
[2602.14357] Key Considerations for Domain Expert Involvement in LLM Design and Evaluation: An Ethnographic Study
This ethnographic study explores the role of domain experts in the design and evaluation of Large Language Models (LLMs), highlighting ke...
[2602.14397] LRD-MPC: Efficient MPC Inference through Low-rank Decomposition
The paper presents LRD-MPC, a method that enhances the efficiency of secure multi-party computation (MPC) in machine learning by utilizin...
Related Topics
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime