Finally Abliterated Sarvam 30B and 105B!
I abliterated Sarvam-30B and 105B - India's first multilingual MoE reasoning models - and found something interesting along the way! Reas...
ML algorithms, training, and inference
I abliterated Sarvam-30B and 105B - India's first multilingual MoE reasoning models - and found something interesting along the way! Reas...
Hi everyone, Just wanted to share a small but hard-won milestone. After a long plateau at 94.48%, we’ve pushed the official BANKING77-77 ...
We built a Label Quality Score (LQS) system for our dataset marketplace and opened it up as a free standalone tool. Upload a dataset → ge...
Relatively light at just 2 billion parameters, the model is meant for use with consumer-grade GPUs for those who want to self-host it. It...
Google TurboQuant This is a new compression algorithm. Every time a model answers a question, it stores a massive amount of intermediate ...
Mistral's new speech model can run on a smartwatch or a smartphone.
The best snow-forecasting app for skiers and snowboarders isn’t from any of the federally funded weather services. Nor from any of the bi...
## THE ARCHITECT’S STORY: FROM THE 1985 ROOT TO THE "AI WASH" To those who believe in the truth of a human life, I am writing to you not ...
I've been experimenting with real-time pipelines that combine OCR + TTS + voice conversion, and I ended up building a desktop app that ca...
Recognized across 7 categories by Clutch, Excellent Webworld reinforces its position as a trusted AI and software partner delivering cons...
Abstract page for arXiv paper 2603.18865: RadioDiff-FS: Physics-Informed Manifold Alignment in Few-Shot Diffusion Models for High-Fidelit...
Abstract page for arXiv paper 2603.18853: Learn for Variation: Variationally Guided AAV Trajectory Learning in Differentiable Environments
Abstract page for arXiv paper 2603.14831: Neural Networks as Local-to-Global Computations
Abstract page for arXiv paper 2603.11804: OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs
Abstract page for arXiv paper 2602.07058: SPARE: Self-distillation for PARameter-Efficient Removal
Abstract page for arXiv paper 2602.00381: Modeling Image-Caption Rating from Comparative Judgments
Abstract page for arXiv paper 2512.23138: Why Machine Learning Models Systematically Underestimate Extreme Values II: How to Fix It with ...
Abstract page for arXiv paper 2512.16917: Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Abstract page for arXiv paper 2512.04000: Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
Abstract page for arXiv paper 2511.21542: E0: Enhancing Generalization and Fine-Grained Control in VLA Models via Tweedie Discrete Diffusion
Abstract page for arXiv paper 2511.20888: Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets
Abstract page for arXiv paper 2510.12728: Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior
Abstract page for arXiv paper 2510.10223: You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime