[2602.18439] Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models
Summary
This article presents a replication study of the FedTPG model, which enhances vision-language model performance in federated learning scenarios by generating dynamic prompts based on class names.
Why It Matters
The study validates the effectiveness of text-driven prompt generation in improving generalization to unseen classes in federated learning, addressing a significant challenge in machine learning. This replication confirms the robustness of the original findings, contributing to the field's understanding of federated learning applications in computer vision.
Key Takeaways
- The FedTPG model shows improved generalization to unseen classes.
- Dynamic prompt generation outperforms static methods in federated settings.
- The study achieved results closely matching the original paper's findings, confirming its validity.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18439 (cs) [Submitted on 24 Nov 2025] Title:Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models Authors:Suraj Prasad, Anubha Pant View a PDF of the paper titled Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models, by Suraj Prasad and Anubha Pant View PDF HTML (experimental) Abstract:Vision-language models like CLIP have demonstrated remarkable zero-shot capabilities, yet their adaptation to federated learning scenarios presents significant challenges, particularly regarding generalization to unseen classes. The original FedTPG paper \cite{Qiu2024} addresses this limitation by introducing a text driven prompt generation network that dynamically creates prompts conditioned on class names, enabling better cross-class generalization in federated settings. In this work, we present a faithful replication study of FedTPG, evaluating the pre-trained model on six diverse vision datasets: Caltech101, Oxford Flowers, FGVC Aircraft, Oxford Pets, Food-101, and DTD. Our evaluation achieves results within 0.2\% of the original paper's reported accuracies, with an average accuracy of 74.58\% on seen (base) classes and 76.00\% on unseen (new) classes, demonstrating a +1.43 percentage point improvement in generalization. These results validate the original paper's core claims: (1) text-driven prompt generation enables superior generalization to unseen cla...