[2601.09566] Hot-Start from Pixels: Low-Resolution Visual Tokens for

[2601.09566] Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling

arXiv - AI March 04, 2026 3 min read

About this article

Abstract page for arXiv paper 2601.09566: Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.09566 (cs) [Submitted on 14 Jan 2026 (v1), last revised 3 Mar 2026 (this version, v3)] Title:Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling Authors:Shuyang Xiang, Hao Guan View a PDF of the paper titled Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling, by Shuyang Xiang and Hao Guan View PDF HTML (experimental) Abstract:Large language models typically represent Chinese characters as discrete index-based tokens, largely ignoring their visual form. For logographic scripts, visual structure carries semantic and phonetic information, which may aid prediction. We investigate whether low-resolution visual inputs can serve as an alternative for character-level modeling. Instead of token IDs, our decoder receives grayscale images of individual characters, with resolutions as low as 8 x 8 pixels. Remarkably, these inputs achieve 39.2% accuracy, comparable to the index-based baseline of 39.1%. Such low-resource settings also exhibit a pronounced hot-start effect: by 0.4% of total training, accuracy reaches above 12%, while index-based models lag at below 6%. Overall, our results demonstrate that minimal visual structure can provide a robust and efficient signal for Chinese language modeling, offering an alternative perspective on character representation that complements traditional index-based approaches. Comments: Subjects: Computer Vision and Pa...

Originally published on March 04, 2026. Curated by AI News.

Llms

OpenAI & Anthropic’s CEOs Wouldn't Hold Hands, but Their Models Fell in Love In An LLM Dating Show

People ask AI relationship questions all the time, from "Does this person like me?" to "Should I text back?" But have you ever thought ab...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

A 135M model achieves coherent output on a laptop CPU. Scaling is σ compensation, not intelligence.

SmolLM2 135M. Lenovo T14 CPU. No GPU. No RLHF. No BPE. Coherent, non-sycophantic, contextually appropriate output. First message. No prio...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

OpenClaw + Claude might get harder to use going forward (creator just confirmed)

Just saw a post from Peter Steinberger (creator of OpenClaw) saying that it’s likely going to get harder in the future to keep OpenClaw w...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

I "Vibecoded" Karpathy’s LLM Wiki into a native Android/Windows app to kill the friction of personal knowledge bases.

A few days ago, Andrej Karpathy’s post on "LLM Knowledge Bases" went viral. He proposed a shift from manipulating code to manipulating kn...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

[2601.09566] Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling

About this article

Related Articles

OpenAI & Anthropic’s CEOs Wouldn't Hold Hands, but Their Models Fell in Love In An LLM Dating Show

A 135M model achieves coherent output on a laptop CPU. Scaling is σ compensation, not intelligence.

OpenClaw + Claude might get harder to use going forward (creator just confirmed)

I "Vibecoded" Karpathy’s LLM Wiki into a native Android/Windows app to kill the friction of personal knowledge bases.

No comments

Stay updated with AI News