Machine Learning Computer Vision Ai Infrastructure

[2602.16320] RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion

arXiv - Machine Learning February 19, 2026 4 min read Article

Summary

RefineFormer3D presents a lightweight transformer architecture for 3D medical image segmentation, achieving high accuracy with significantly fewer parameters than existing methods.

Why It Matters

The study addresses the critical challenge of efficient 3D medical image segmentation, which is essential for clinical workflows. By proposing a model that balances accuracy and computational efficiency, it enhances the feasibility of deploying advanced AI in resource-constrained medical settings.

Key Takeaways

RefineFormer3D achieves 93.44% and 85.9% average Dice scores on ACDC and BraTS benchmarks, respectively.
The model utilizes only 2.94M parameters, making it significantly lighter than contemporary transformer methods.
Fast inference time of 8.35 ms per volume on GPU supports its use in clinical environments.
Key components include GhostConv3D for feature extraction and a cross-attention fusion decoder.
The architecture is designed for practical deployment in resource-limited settings.

Electrical Engineering and Systems Science > Image and Video Processing arXiv:2602.16320 (eess) [Submitted on 18 Feb 2026] Title:RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion Authors:Kavyansh Tyagi, Vishwas Rathi, Puneet Goyal View a PDF of the paper titled RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion, by Kavyansh Tyagi and 2 other authors View PDF Abstract:Accurate and computationally efficient 3D medical image segmentation remains a critical challenge in clinical workflows. Transformer-based architectures often demonstrate superior global contextual modeling but at the expense of excessive parameter counts and memory demands, restricting their clinical deployment. We propose RefineFormer3D, a lightweight hierarchical transformer architecture that balances segmentation accuracy and computational efficiency for volumetric medical imaging. The architecture integrates three key components: (i) GhostConv3D-based patch embedding for efficient feature extraction with minimal redundancy, (ii) MixFFN3D module with low-rank projections and depthwise convolutions for parameter-efficient feature extraction, and (iii) a cross-attention fusion decoder enabling adaptive multi-scale skip connection integration. RefineFormer3D contains only 2.94M parameters, substantially fewer than contemporary transformer-based methods. Extensive experim...

Read Original Article

[2602.16320] RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion

Summary

Why It Matters

Key Takeaways

Related Articles

Hub Group Using AI, Machine Learning for Real-Time Visibility of Shipments

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

World models will be the next big thing, bye-bye LLMs

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

No comments

Stay updated with AI News