Given the scale and complexity of forthcoming HSI data, producing labeled datasets at the scale required to improve state-of-the-art performance is impractical and prohibitively costly. Unsupervised pre-training algorithms have revolutionized deep learning for natural language processing and computer vision by tapping into vast troves of unlabeled data, but these advances have seen little adoption in the HSI domain. We present some early results from self-supervised pre-training for hyperspectral imagery using masked auto-encoders early and compare different pre-training approaches and masking techniques; specifically masking size, dimension (spatial, spectral, both), mask fraction, and mask coherence (spatially independent or consistent). We summarize our lessons learned and highlight the most promising approaches towards building a foundation model for hyperspectral data.
|