[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss
TL;DR: Removing the right layers (instead of shrinking all layers) makes transformer models ~8–12% smaller with only ~6–8% quality loss, ...
ML algorithms, training, and inference
TL;DR: Removing the right layers (instead of shrinking all layers) makes transformer models ~8–12% smaller with only ~6–8% quality loss, ...
Been working on a weight divergence trajectory curvature approach to detecting neural network training instability. Treats weight updates...
UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...
Abstract page for arXiv paper 2603.23974: Machine vision with small numbers of detected photons per inference
Abstract page for arXiv paper 2603.23971: The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More
Abstract page for arXiv paper 2603.23943: ChargeFlow: Flow-Matching Refinement of Charge-Conditioned Electron Densities
Abstract page for arXiv paper 2603.23937: Dialogue to Question Generation for Evidence-based Medical Guideline Agent Development
Abstract page for arXiv paper 2603.23911: Self-Distillation for Multi-Token Prediction
Abstract page for arXiv paper 2603.23933: ORACLE: Orchestrate NPC Daily Activities using Contrastive Learning with Transformer-CVAE
Abstract page for arXiv paper 2603.23873: The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions...
Abstract page for arXiv paper 2603.23914: Attention-aware Inference Optimizations for Large Vision-Language Models with Memory-efficient ...
Abstract page for arXiv paper 2603.23835: Beyond Consistency: Inference for the Relative risk functional in Deep Nonparametric Cox Models
Abstract page for arXiv paper 2603.23822: How Vulnerable Are Edge LLMs?
Abstract page for arXiv paper 2603.23821: Perturbation: A simple and efficient adversarial tracer for representation learning in language...
Abstract page for arXiv paper 2603.23800: Object Search in Partially-Known Environments via LLM-informed Model-based Planning and Prompt ...
Abstract page for arXiv paper 2603.23794: Sparse Autoencoders for Interpretable Medical Image Representation Learning
Abstract page for arXiv paper 2603.23785: Retinal Disease Classification from Fundus Images using CNN Transfer Learning
Abstract page for arXiv paper 2603.23722: Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL
Abstract page for arXiv paper 2603.23736: Wasserstein Parallel Transport for Predicting the Dynamics of Statistical Systems
Abstract page for arXiv paper 2603.23685: The Economics of Builder Saturation in Digital Markets
Abstract page for arXiv paper 2603.23668: Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language...
Abstract page for arXiv paper 2603.23640: LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustain...
Abstract page for arXiv paper 2603.23611: LLMORPH: Automated Metamorphic Testing of Large Language Models
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime