HydraLM: 22× faster decoding and 16× smaller state memory in long-context inference experiments [P]
I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark su...
ML algorithms, training, and inference
I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark su...
I’m currently a first-year Master’s student in Data Science & AI, and I’m trying to figure out whether a research-oriented career is ...
We maintain an open-source catalog of cloud GPU offerings (skypilot-catalog, Apache 2.0). It auto-fetches pricing from 20+ cloud APIs eve...
Abstract page for arXiv paper 2604.01472: The Newton-Muon Optimizer
Abstract page for arXiv paper 2604.01466: Efficient Equivariant Transformer for Self-Driving Agent Modeling
Abstract page for arXiv paper 2604.01455: Infeasibility Aware Large Language Models for Combinatorial Optimization
Abstract page for arXiv paper 2604.01330: Evolutionary Multi-Objective Fusion of Deepfake Speech Detectors
Abstract page for arXiv paper 2604.01365: VIANA: character Value-enhanced Intensity Assessment via domain-informed Neural Architecture
Abstract page for arXiv paper 2604.01346: Safety, Security, and Cognitive Risks in World Models
Abstract page for arXiv paper 2604.01339: Regularizing Attention Scores with Bootstrapping
Abstract page for arXiv paper 2604.01264: OkanNet: A Lightweight Deep Learning Architecture for Classification of Brain Tumor from MRI Im...
Abstract page for arXiv paper 2604.01229: Interpretable Battery Aging without Extra Tests via Neural-Assisted Physics-based Modelling
Abstract page for arXiv paper 2604.01231: Experimental Design for Missing Physics
Abstract page for arXiv paper 2604.02322: Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
Abstract page for arXiv paper 2604.02292: Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference
Abstract page for arXiv paper 2604.02288: Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
Abstract page for arXiv paper 2604.02270: Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
Abstract page for arXiv paper 2604.02206: LEO: Graph Attention Network based Hybrid Multi Sensor Extended Object Fusion and Tracking for ...
Abstract page for arXiv paper 2604.02268: SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Abstract page for arXiv paper 2604.02260: Model-Based Reinforcement Learning for Control under Time-Varying Dynamics
Abstract page for arXiv paper 2604.02250: Smoothing the Landscape: Causal Structure Learning via Diffusion Denoising Objectives
Abstract page for arXiv paper 2604.02215: Universal Hypernetworks for Arbitrary Models
Abstract page for arXiv paper 2604.02201: On the Role of Depth in the Expressivity of RNNs
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime