Pansharpening is the task of fusing panchromatic and low-resolution multi-spectral data to obtain a high-resolution equivalent. In our study, we develop a cross-band transformer (CBT) method for pansharpening, incorporating and adapting successful features from vision transformers. Specifically, the introduction of cross-attention mechanisms in two new fusion modules enables CBT to leverage information across panchromatic and spectral input bands, which is shown to improve accuracy. Additionally, we enhance the aforementioned modules by incorporating wavelet transforms to spatially compress the image space, allowing the attention mechanism to capture a contextually broader attention window. This further improves performance while minimizing latency. Using similar computational requirements as various reference methods, our approaches produce competitive performance on various benchmark datasets. Finally, we developed the Sev2Mod dataset, using one geostationary and one sunsynchronous polar orbiting satellite. Sev2Mod presents a more difficult pansharpening task, reflecting the performance of pansharpening methods in noisy real-world applications. Code is available at https://github.com/VisionVoyagerX/Wav-CBT.
N. Ntantis, J.S. Wijnands, J.F. Meirink, D. Dibenedetto. A cross-band transformer on wavelets for pansharpening of satellite imagery
Journal: European Journal of Remote Sensing, Volume: 58, Year: 2025, First page: 2495314, doi: 10.1080/22797254.2025.2495314