[2603.22287] Founder effects shape the evolutionary dynamics of multimodality in open LLM families
About this article
Abstract page for arXiv paper 2603.22287: Founder effects shape the evolutionary dynamics of multimodality in open LLM families
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.22287 (cs) [Submitted on 27 Jan 2026] Title:Founder effects shape the evolutionary dynamics of multimodality in open LLM families Authors:Manuel Cebrian View a PDF of the paper titled Founder effects shape the evolutionary dynamics of multimodality in open LLM families, by Manuel Cebrian View PDF HTML (experimental) Abstract:Large language model (LLM) families are improving rapidly, yet it remains unclear how quickly multimodal capabilities emerge and propagate within open families. Using the ModelBiome AI Ecosystem dataset of Hugging Face model metadata and recorded lineage fields (>1.8x10^6 model entries), we quantify multimodality over time and along recorded parent-to-child relations. Cross-modal tasks are widespread in the broader ecosystem well before they become common within major open LLM families: within these families, multimodality remains rare through 2023 and most of 2024, then increases sharply in 2024-2025 and is dominated by image-text vision-language tasks. Across major families, the first vision-language model (VLM) variants typically appear months after the first text-generation releases, with lags ranging from ~1 month (Gemma) to more than a year for several families and ~26 months for GLM. Lineage-conditioned transition rates show weak cross-type transfer: among fine-tuning edges from text-generation parents, only 0.218% yield VLM descendants. Instead, multimodality expands primaril...