[2408.03404] Set2Seq Transformer: Temporal and Position-Aware Set Representations for Sequential Multiple-Instance Learning
About this article
Abstract page for arXiv paper 2408.03404: Set2Seq Transformer: Temporal and Position-Aware Set Representations for Sequential Multiple-Instance Learning
Computer Science > Computer Vision and Pattern Recognition arXiv:2408.03404 (cs) [Submitted on 6 Aug 2024 (v1), last revised 24 Mar 2026 (this version, v3)] Title:Set2Seq Transformer: Temporal and Position-Aware Set Representations for Sequential Multiple-Instance Learning Authors:Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring View a PDF of the paper titled Set2Seq Transformer: Temporal and Position-Aware Set Representations for Sequential Multiple-Instance Learning, by Athanasios Efthymiou and 3 other authors View PDF Abstract:In many real-world applications, modeling both the internal structure of sets and their temporal relationships is essential for capturing complex underlying patterns. Sequential multiple-instance learning aims to address this challenge by learning permutation-invariant representations of sets distributed across discrete timesteps. However, existing methods either focus on learning set representations at a static level, ignoring temporal dynamics, or treat sequences as ordered lists of individual elements, lacking explicit mechanisms for representing sets. Crucially, effective modeling of such sequences of sets often requires encoding both the positional ordering across timesteps and their absolute temporal values to jointly capture relative progression and temporal context. In this work, we propose Set2Seq Transformer, a novel architecture that jointly models permutation-invariant set structure and temporal d...