Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)
TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...
GPT, Claude, Gemini, and other LLMs
TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...
Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage
Abstract page for arXiv paper 2603.16790: InCoder-32B: Code Foundation Model for Industrial Scenarios
Abstract page for arXiv paper 2603.22367: Reasoner-Executor-Synthesizer: Scalable Agentic Architecture with Static O(1) Context Window
Abstract page for arXiv paper 2603.23268: SafeSeek: Universal Attribution of Safety Circuits in Language Models
Abstract page for arXiv paper 2603.22363: Early Discoveries of Algorithmist I: Promise of Provable Algorithm Synthesis at Scale
Abstract page for arXiv paper 2603.22341: T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
Abstract page for arXiv paper 2603.23198: Sparser, Faster, Lighter Transformer Language Models
Abstract page for arXiv paper 2603.22335: Causal Direct Preference Optimization for Distributionally Robust Generative Recommendation
Abstract page for arXiv paper 2603.23173: A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
Abstract page for arXiv paper 2603.23140: DAK-UCB: Diversity-Aware Prompt Routing for LLMs and Generative Models
Abstract page for arXiv paper 2603.23129: Polaris: A Gödel Agent Framework for Small Language Models through Experience-Abstracted Policy...
Abstract page for arXiv paper 2603.22327: AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI
Abstract page for arXiv paper 2603.23043: Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts
Abstract page for arXiv paper 2603.22984: Can Graph Foundation Models Generalize Over Architecture?
Abstract page for arXiv paper 2603.22321: From Instructions to Assistance: a Dataset Aligning Instruction Manuals with Assembly Videos fo...
Abstract page for arXiv paper 2603.22892: VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents
Abstract page for arXiv paper 2603.22882: TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Explora...
Abstract page for arXiv paper 2603.22784: Caterpillar of Thoughts: The Optimal Test-Time Algorithm for Large Language Models
Abstract page for arXiv paper 2603.22295: Whether, Not Which: Mechanistic Interpretability Reveals Dissociable Affect Reception and Emoti...
Abstract page for arXiv paper 2603.22293: TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs
Abstract page for arXiv paper 2603.22289: MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing
Abstract page for arXiv paper 2603.22713: Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Con...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime