[2603.11703] EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering
About this article
Abstract page for arXiv paper 2603.11703: EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering
Computer Science > Machine Learning arXiv:2603.11703 (cs) [Submitted on 12 Mar 2026 (v1), last revised 8 Apr 2026 (this version, v2)] Title:EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering Authors:Nicolas Deutschmann, Constance Ferragu, Jonathan D. Ziegler, Shayan Aziznejad, Eli Bixby View a PDF of the paper titled EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering, by Nicolas Deutschmann and 3 other authors View PDF Abstract:We introduce EvoFlows, a variable-length protein sequence-to-sequence modeling approach designed for protein engineering. Existing protein language models are poorly suited for optimization tasks: autoregressive models require full sequence generation, masked language and discrete diffusion models rely on pre-specified mutation locations, and no existing methods naturally support insertions and deletions relative to a template sequence. EvoFlows learns mutational trajectories between evolutionarily related protein sequences via edit flows, allowing it to perform a controllable number of mutations (insertions, deletions, and substitutions) on a template sequence, predicting not only _which_ mutation to perform, but also _where_ it should occur. Through extensive _in silico_ evaluation on diverse protein families from UniRef and OAS, we show that EvoFlows generates variants that remain consistent with natural protein families while exploring farther from template sequences than leading baselines. Comments: Sub...