Presentation + Paper
15 June 2023 Towards masked autoencoding pre-training for wide area motion imagery
Steve Goley, Rohan Pradhan, Austin Welch
Author Affiliations +
Abstract
Transformer models are demonstrating remarkable and emergent capabilities in the natural language processing domain. These models are bounded only by the availability of large training datasets. These datasets can be tractably obtained since natural language models are pre-trained using self-supervision in the form of token masking. Papers like He et al. and Cao et al. have recently shown the power of this token masking technique by utilizing masked autoencoders as scalable vision learners in combination with a self-supervised pre-training technique for vision transformer models. Feichtenhofer et al. extended these techniques to video, proving that masked autoencoders are scalable spatiotemporal learners as well. To our best knowledge, these techniques have only been experimented on ground-level, object-centric style imagery and video. Extending these techniques to remote or overhead imagery presents two significant problems. First, the size of objects of interest are small compared to the typical mask patch size. Second, the frames are not object centered. In this study, we explore if modern self-supervised pre-training techniques like masked auto encoding extend well to overhead wide area motion imagery (WAMI) data. We argue that modern pre-training techniques like MAE are well suited to WAMI data given the typical object size in this domain as well as the ability to leverage strong global spatial contextual information. To this end, we conduct a comprehensive exploration of different patch sizes and masking ratios on the popular WAMI dataset, WPAFB 2009. We find that domain-specific adjustments to these pre-training techniques result in downstream performance improvements on computer vision tasks including object detection.
Conference Presentation
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Steve Goley, Rohan Pradhan, and Austin Welch "Towards masked autoencoding pre-training for wide area motion imagery", Proc. SPIE 12525, Geospatial Informatics XIII , 1252508 (15 June 2023); https://doi.org/10.1117/12.2665871
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Object detection

Data modeling

Transformers

Remote sensing

Video

Computer vision technology

Back to Top