MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
https://arxiv.org/abs/2604.05091 Abstract: "We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large l...
GPT, Claude, Gemini, and other LLMs
https://arxiv.org/abs/2604.05091 Abstract: "We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large l...
Are we still stuck in the "feature engineering" era of optimization? We trust neural networks to learn unimaginably complex patterns from...
Been writing code professionally for 8+ years. I’m now mass spending more time describing features in plain english than writing actual c...
Abstract page for arXiv paper 2601.08393: Controlled LLM Training on Spectral Sphere
Abstract page for arXiv paper 2601.04548: Identifying Good and Bad Neurons for Task-Level Controllable LLMs
Abstract page for arXiv paper 2601.02663: When Do Tools and Planning Help Large Language Models Think? A Cost- and Latency-Aware Benchmark
Abstract page for arXiv paper 2512.15163: MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP...
Abstract page for arXiv paper 2512.14391: RePo: Language Models with Context Re-Positioning
Abstract page for arXiv paper 2512.13586: ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Abstract page for arXiv paper 2511.21399: Steering Awareness: Models Can Be Trained to Detect Activation Steering
Abstract page for arXiv paper 2511.16786: Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach
Abstract page for arXiv paper 2511.03153: RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring
Abstract page for arXiv paper 2511.01870: CytoNet: A Foundation Model for the Human Cerebral Cortex at Cellular Resolution
Abstract page for arXiv paper 2510.27173: FMint-SDE: A Multimodal Foundation Model for Accelerating Numerical Simulation of SDEs via Erro...
Abstract page for arXiv paper 2510.22503: LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery
Abstract page for arXiv paper 2510.20333: GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Envi...
Abstract page for arXiv paper 2510.18876: Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Abstract page for arXiv paper 2510.16714: SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
Abstract page for arXiv paper 2510.16688: Pursuing Minimal Sufficiency in Spatial Reasoning
Abstract page for arXiv paper 2510.00507: Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs
Abstract page for arXiv paper 2509.25149: Pretraining Large Language Models with NVFP4
Abstract page for arXiv paper 2510.00177: PrefDisco: Benchmarking Proactive Personalized Reasoning
Abstract page for arXiv paper 2509.24210: BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime