Built a demo where an agent can provision 2 GPUs, then gets hard-blocked on the 3rd call
Policy: - budget = 1000 - each `provision_gpu(a100)` call = 500 Result: - call 1 -> ALLOW - call 2 -> ALLOW - call 3 -> DENY (`B...
GPUs, training clusters, MLOps, and deployment
Policy: - budget = 1000 - each `provision_gpu(a100)` call = 500 Result: - call 1 -> ALLOW - call 2 -> ALLOW - call 3 -> DENY (`B...
We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...
Hi all, I made a small tool that I've been using for my own literature reviews and figured I'd share in case it's useful to anyone else. ...
The MCIF benchmark introduces a novel framework for evaluating multimodal crosslingual instruction-following capabilities in large langua...
The paper presents ReplaceMe, a novel method for network simplification that utilizes depth pruning and transformer block linearization, ...
This paper presents a knowledge distillation approach for Multi-View 3D reconstruction, utilizing a teacher-student model framework to en...
This paper presents an adaptive differentially private federated learning framework that addresses challenges in model efficiency and sta...
The paper presents AUTOBUS, an Autonomous Business System that integrates LLM-based AI agents with predicate-logic programming to enhance...
This paper explores causal explanations in image classification, demonstrating their formal properties and computability, while introduci...
The paper presents $ exttt{SPECS}$, a novel method for latency-aware test-time scaling in large language models, achieving improved accur...
The paper discusses a method for embodied AI agents to infer user goals from open-ended dialogues using Large Language Models (LLMs), emp...
The paper presents Sink-Aware Pruning, a novel method for optimizing Diffusion Language Models (DLMs) by identifying and removing unstabl...
The paper discusses the balance between weak and strong verification methods in reasoning with large language models (LLMs), emphasizing ...
This paper outlines a vision for fully autonomous, AI-native particle accelerators, emphasizing AI co-design for optimal performance and ...
The paper presents LORA-CRAFT, a novel parameter-efficient fine-tuning method that utilizes Tucker tensor decomposition on pre-trained at...
Jolt Atlas introduces a zero-knowledge machine learning framework that enhances inference verification through lookup arguments, optimizi...
This paper explores the evolution of web research through generative-retrieval architectures, highlighting the transformative impact of l...
This study presents a taxonomy for fine-grained uncertainty quantification in long-form language model outputs, highlighting effective me...
This paper explores the convergence of two-layer neural networks trained with Gaussian masked inputs, demonstrating linear convergence th...
This paper explores vulnerabilities in embodied AI systems, highlighting the inadequacy of existing analyses focused solely on LLMs or CP...
The paper presents SubQuad, an innovative pipeline for analyzing adaptive immune repertoires, addressing challenges of high computational...
WebFAQ 2.0 introduces a multilingual QA dataset with 198 million FAQ-based question-answer pairs across 108 languages, enhancing multilin...
This paper presents a novel approach to crystal structure prediction by utilizing large language models for fine-grained symmetry inferen...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime