We propose SiamGauss, a Siamese region proposal network with a Gaussian head for single-target visual object tracking for aerial benchmarks. Visual tracking in aerial videos faces unique challenges due to the large field of view resulting in small size objects, similar looking objects (confusers) in close proximity, occlusions, and fast motion due to simultaneous object and camera motion. In Siamese tracking, a cross-correlation ration is performed in the embedding space to obtain a similarity map of the target within a search frame, which is then used to localize the target. The proposed Gaussian head helps suppress the activation produced in the similarity map on confusers present in the search frame during training while boosting the confidence on the target. This activation suppression improves the confuser awareness of our tracker. In addition, improving the activation on the target helps maintain tracking consistency in fast motion. Our proposed Gaussian head is only applied during training and introduces no additional computational overhead during inference while tracking. Thus, SiamGauss achieves fast runtime performance. We evaluate our method on multiple aerial benchmarks showing that SiamGauss performs favorably with state-of-the-art trackers while rating at a frame rate of 96 frames per second.
Siamese deep-network trackers have received significant attention in recent years due to their real-time speed and state-of-the-art performance. However, Siamese trackers suffer from similar looking confusers, that are prevalent in aerial imagery and create challenging conditions due to prolonged occlusions where the tracker object re-appears under different pose and illumination. Our work proposes SiamReID, a novel re-identification framework for Siamese trackers, that incorporates confuser rejection during prolonged occlusions and is wellsuited for aerial tracking. The re-identification feature is trained using both triplet loss and a class balanced loss. Our approach achieves state-of-the-art performance in the UAVDT single object tracking benchmark.
Kubelka-Munk (K-M) theory has been successfully used to estimate pigment concentrations in the pigment mixtures of modern paintings in spectral imagery. In this study the single-constant K-M theory has been utilized for the classification of green pigments in the Selden Map of China, a navigational map of the South China Sea likely created in the early seventeenth century. Hyperspectral data of the map was collected at the Bodleian Library, University of Oxford, and can be used to estimate the pigment diversity, and spatial distribution, within the map. This work seeks to assess the utility of analyzing the data in the K/S space from Kubelka-Munk theory, as opposed to the traditional reflectance domain. We estimate the dimensionality of the data and extract endmembers in the reflectance domain. Then we perform linear unmixing to estimate abundances in the K/S space, and following Bai, et al. (2017), we perform a classification in the abundance space. Finally, due to the lack of ground truth labels, the classification accuracy was estimated by computing the mean spectrum of each class as the representative signature of that class, and calculating the root mean squared error with all the pixels in that class to create a spatial representation of the error. This highlights both the magnitude of, and any spatial pattern in, the errors, indicating if a particular pigment is not well modeled in this approach.
We present a Fully Convolutional Adaptive Tracker (FCAT) based on a Siamese architecture that operates in real-time and is well suited for tracking from aerial platforms. Real time performance is achieved by using a fully convolutional network to generate a densely sampled response map in a single pass. The network is fined-tuned on the tracked target with an adaptation approach that is similar to the procedure used to train Discriminative Correlation Filters. A key difference between FCAT and Discriminative Correlation Filters is that FCAT fine-tunes the template feature directly using Stochastic Gradient Descent while DCF regresses a correlation filter. The effectiveness of the proposed method was illustrated on surveillance style videos, where FCAT performs competitively with state-of-the-art visual trackers while maintaining real-time tracking speeds of over 30 frames per second.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.