[2603.19427] Vocabulary shapes cross-lingual variation of word-order

[2603.19427] Vocabulary shapes cross-lingual variation of word-order learnability in language models

arXiv - Machine Learning March 23, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.19427: Vocabulary shapes cross-lingual variation of word-order learnability in language models

Computer Science > Computation and Language arXiv:2603.19427 (cs) [Submitted on 19 Mar 2026] Title:Vocabulary shapes cross-lingual variation of word-order learnability in language models Authors:Jonas Mayer Martins, Jaap Jumelet, Viola Priesemann, Lisa Beinborn View a PDF of the paper titled Vocabulary shapes cross-lingual variation of word-order learnability in language models, by Jonas Mayer Martins and Jaap Jumelet and Viola Priesemann and Lisa Beinborn View PDF HTML (experimental) Abstract:Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages. We observe that greater word-order irregularity consistently raises model surprisal, indicating reduced learnability. Sentence reversal, however, affects learnability only weakly. A coarse distinction of free- (e.g., Czech and Finnish) and fixed-word-order languages (e.g., English and French) does not explain cross-lingual variation. Instead, the structure of the word and subword vocabulary strongly predicts the model surprisal. Overall, vocabulary structure emerges as a key driver of computational word-order learnability across languages. Comments: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) Cite as: arXiv:2603.19427 [cs.CL] (or arXiv:2603.19427v1 [cs.CL] for this version) https://doi.org/10.48550/...

Originally published on March 23, 2026. Curated by AI News.

Llms

What does Gemini think of you?

I noticed that Gemini was referring back to a lot of queries I've made in the past and was using that knowledge to drive follow up prompt...

Reddit - Artificial Intelligence · 1 min · 31 minutes ago

Llms

This app helps you see what LLMs you can run on your hardware

submitted by /u/dev_is_active [link] [comments]

Reddit - Artificial Intelligence · 1 min · 32 minutes ago

Llms

TRACER: Learn-to-Defer for LLM Classification with Formal Teacher-Agreement Guarantees

I'm releasing TRACER (Trace-Based Adaptive Cost-Efficient Routing), a library for learning cost-efficient routing policies from LLM trace...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

Mistral AI raises $830M in debt to set up a data center near Paris | TechCrunch

Mistral aims to start operating the data center by the second quarter of 2026.

TechCrunch - AI · 4 min · about 1 hour ago

[2603.19427] Vocabulary shapes cross-lingual variation of word-order learnability in language models

About this article

Related Articles

What does Gemini think of you?

This app helps you see what LLMs you can run on your hardware

TRACER: Learn-to-Defer for LLM Classification with Formal Teacher-Agreement Guarantees

Mistral AI raises $830M in debt to set up a data center near Paris | TechCrunch

No comments

Stay updated with AI News