[2509.02060] Morphology-Aware Peptide Discovery via Masked Conditional Generative Modeling
Summary
The paper presents PepMorph, a novel peptide discovery pipeline that utilizes masked conditional generative modeling to predict peptide self-assembly morphologies, achieving an 83% success rate in targeted design specifications.
Why It Matters
This research addresses the challenge of peptide self-assembly, which is crucial for developing biocompatible materials in biomedical and energy applications. By improving the peptide discovery process, it opens new avenues for material science and drug design, potentially leading to significant advancements in these fields.
Key Takeaways
- PepMorph uses a Transformer-based Conditional Variational Autoencoder for peptide generation.
- The model conditions on geometric and physicochemical descriptors to guide self-assembly morphology.
- An 83% success rate was achieved in validating generated peptides against design specifications.
- The approach leverages existing datasets to improve peptide discovery efficiency.
- This work has implications for biocompatible material design in various applications.
Quantitative Biology > Biomolecules arXiv:2509.02060 (q-bio) [Submitted on 2 Sep 2025 (v1), last revised 24 Feb 2026 (this version, v3)] Title:Morphology-Aware Peptide Discovery via Masked Conditional Generative Modeling Authors:Nuno Costa, Julija Zavadlav View a PDF of the paper titled Morphology-Aware Peptide Discovery via Masked Conditional Generative Modeling, by Nuno Costa and 1 other authors View PDF HTML (experimental) Abstract:Peptide self-assembly prediction offers a powerful bottom-up strategy for designing biocompatible, low-toxicity materials for large-scale synthesis in a broad range of biomedical and energy applications. However, screening the vast sequence space for categorization of aggregate morphology remains intractable. We introduce PepMorph, an end-to-end peptide discovery pipeline that generates novel sequences that are not only prone to aggregate but whose self-assembly is steered toward fibrillar or spherical morphologies by conditioning on isolated peptide descriptors that serve as morphology proxies. To this end, we compiled a new dataset by leveraging existing aggregation propensity datasets and extracting geometric and physicochemical descriptors. This dataset is then used to train a Transformer-based Conditional Variational Autoencoder with a masking mechanism, which generates novel peptides under arbitrary conditioning. After filtering to ensure design specifications and validation of generated sequences through coarse-grained molecular dynami...