[2602.17867] ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization
Summary
The paper presents ADAPT, a hybrid method for optimizing prompts in LLM feature visualization, addressing challenges in local minima and discrete text input.
Why It Matters
Understanding feature visualization in Large Language Models (LLMs) is crucial for improving model interpretability and performance. ADAPT provides a novel approach to overcome limitations of existing methods, enhancing the ability to identify and optimize inputs that activate specific features within LLMs.
Key Takeaways
- ADAPT combines beam search with adaptive gradient-guided mutation for prompt optimization.
- The method is designed to address local minima issues specific to LLMs.
- Evaluation on Sparse Autoencoder latents demonstrates ADAPT's superior performance over existing techniques.
- Feature visualization for LLMs is feasible with tailored design assumptions.
- The study contributes to the understanding of feature encoding in LLM activation spaces.
Computer Science > Machine Learning arXiv:2602.17867 (cs) [Submitted on 19 Feb 2026] Title:ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization Authors:João N. Cardoso, Arlindo L. Oliveira, Bruno Martins View a PDF of the paper titled ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization, by Jo\~ao N. Cardoso and 2 other authors View PDF HTML (experimental) Abstract:Understanding what features are encoded by learned directions in LLM activation space requires identifying inputs that strongly activate them. Feature visualization, which optimizes inputs to maximally activate a target direction, offers an alternative to costly dataset search approaches, but remains underexplored for LLMs due to the discrete nature of text. Furthermore, existing prompt optimization techniques are poorly suited to this domain, which is highly prone to local minima. To overcome these limitations, we introduce ADAPT, a hybrid method combining beam search initialization with adaptive gradient-guided mutation, designed around these failure modes. We evaluate on Sparse Autoencoder latents from Gemma 2 2B, proposing metrics grounded in dataset activation statistics to enable rigorous comparison, and show that ADAPT consistently outperforms prior methods across layers and latent types. Our results establish that feature visualization for LLMs is tractable, but requires design assumptions tailored to the domain. Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL) C...