[2602.14041] BitDance: Scaling Autoregressive Generative Models with Binary Tokens

[2602.14041] BitDance: Scaling Autoregressive Generative Models with Binary Tokens

arXiv - AI 4 min read Article

Summary

BitDance introduces a novel autoregressive image generator that utilizes binary tokens for enhanced efficiency and performance in generating high-resolution images.

Why It Matters

This research is significant as it addresses the challenges of scaling autoregressive generative models, offering a more efficient method for image generation. By employing binary tokens and innovative decoding techniques, BitDance achieves superior performance with fewer parameters, making it a valuable contribution to the field of generative AI.

Key Takeaways

  • BitDance uses binary tokens to represent a vast state space, enhancing model expressiveness.
  • The new next-patch diffusion method allows for parallel token prediction, significantly speeding up inference.
  • BitDance achieves state-of-the-art performance on ImageNet with fewer parameters and faster generation times.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14041 (cs) [Submitted on 15 Feb 2026] Title:BitDance: Scaling Autoregressive Generative Models with Binary Tokens Authors:Yuang Ai, Jiaming Han, Shaobin Zhuang, Weijia Mao, Xuefeng Hu, Ziyan Yang, Zhenheng Yang, Huaibo Huang, Xiangyu Yue, Hao Chen View a PDF of the paper titled BitDance: Scaling Autoregressive Generative Models with Binary Tokens, by Yuang Ai and 9 other authors View PDF Abstract:We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to $2^{256}$ states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space diffusion to generate the binary tokens. Furthermore, we propose next-patch diffusion, a new decoding method that predicts multiple tokens in parallel with high accuracy, greatly speeding up inference. On ImageNet 256x256, BitDance achieves an FID of 1.24, the best among AR models. With next-patch diffusion, BitDance beats state-of-the-art parallel AR models that use 1.4B parameters, while using 5.4x fewer parameters (260M) and achieving 8.7x speedup. For text-to-image generation, BitDance trains on large-scale multimoda...

Related Articles

AI Has Flooded All the Weather Apps | WIRED
Machine Learning

AI Has Flooded All the Weather Apps | WIRED

Weather forecasting has gotten a big boost from machine learning. How that translates into what users see can vary.

Wired - AI · 8 min ·
Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

The AI Chip War is Just Getting Started

Everyone talks about AI models, but the real bottleneck might be hardware. According to a recent study by Roots Analysis: AI chip market ...

Reddit - Artificial Intelligence · 1 min ·
Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch
Machine Learning

Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch

Runway is launching a $10 million fund and startup program to back companies building with its AI video models, as it pushes toward inter...

TechCrunch - AI · 7 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime