[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes
I built a fused MoE dispatch kernel in pure Triton that handles the full forward pass for Mixture-of-Experts models. No CUDA, no vendor-s...
ML algorithms, training, and inference
I built a fused MoE dispatch kernel in pure Triton that handles the full forward pass for Mixture-of-Experts models. No CUDA, no vendor-s...
I am currently working on my response on the rebuttal acknowledgments for ICML and I doubting how to handle the strawman argument of that...
Hey, I am an AI researcher currently working in a deep tech company as a data scientist. Prior to this, I was doing my PhD. My current ro...
Abstract page for arXiv paper 2603.17112: Cascade-Aware Multi-Agent Routing: Spatio-Temporal Sidecars and Geometry-Switching
Abstract page for arXiv paper 2603.18596: Elastic Weight Consolidation Done Right for Continual Learning
Abstract page for arXiv paper 2603.14824: Planning as Goal Recognition: Deriving Heuristics from Intention Models -- Extended Version
Abstract page for arXiv paper 2603.15033: Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion
Abstract page for arXiv paper 2603.07990: MJ1: Multimodal Judgment via Grounded Verification
Abstract page for arXiv paper 2601.12138: DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants
Abstract page for arXiv paper 2511.22076: Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in ...
Abstract page for arXiv paper 2511.07719: Operational machine learning for remote spectroscopic detection of CH$_{4}$ point sources
Abstract page for arXiv paper 2602.01976: FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Co...
Abstract page for arXiv paper 2601.18858: Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer...
Abstract page for arXiv paper 2510.05318: BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynami...
Abstract page for arXiv paper 2601.14026: Universal Approximation Theorem for Input-Connected Multilayer Perceptrons
Abstract page for arXiv paper 2510.00415: Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration und...
Abstract page for arXiv paper 2601.13698: Does Privacy Always Harm Fairness? Data-Dependent Trade-offs via Chernoff Information Neural Es...
Abstract page for arXiv paper 2601.09220: From Hawkes Processes to Attention: Time-Modulated Mechanisms for Event Sequences
Abstract page for arXiv paper 2410.22492: RealCQA-V2: A Diagnostic Benchmark for Structured Visual Entailment over Scientific Charts
Abstract page for arXiv paper 2302.10426: An Accurate and Interpretable Framework for Trustworthy Process Monitoring
Abstract page for arXiv paper 2601.09166: DP-FedSOFIM: Differentially Private Federated Stochastic Optimization using Regularized Fisher ...
Abstract page for arXiv paper 2603.23501: MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage
Abstract page for arXiv paper 2512.06737: Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Awa...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime