To incorporate object locations in a multi-target detection model, we assume that a close duplicate cannot be learned by the model efficiently. So, we use a region-based approach which uses more object location compared to the ground truth locations to localize the targets. The proposed model is able to learn a similarity metric with respect to the ground truth locations which is robust (low false positives) enough for varying images conditions, small aerial target sizes and using few training samples. We report preliminary results on how transfer learning of meta-data a effects small aerial target localization accuracies. Quality ranking from Intersection-over-Union (IOU) in region segmentation models on the aerial ground truth data using pre-trained models from ImageNet, AlexNet, and CIFAR-10 and initialization with three aerial datasets such as the satellite imagery XView2.
The purpose of this paper is on the study of data fusion applications in traditional, spatial and aerial video stream applications which addresses the processing of data from multiple sources using co-occurrence information and uses a common semantic metric. Use of co-occurrence information to infer semantic relations between measurements avoids the need to make use of such external information, such as labels. Many of the current Vector Space Models (VSM) do not preserve the co-occurrence information leading to a not so useful similarity metric. We propose a proximity matrix embedding part of the learning metric embedding which has entries showing the relations between co-occurrence frequency observed in input sets. First, we show an implicit spatial sensor proximity matrix calculation using Jaccard similarity for an array of sensor measurements and compare with the state-of-the-art kernel PCA learning from feature space proximity representation; it relates to a k-radius ball of nearest neighbors. Finally, we extend the class co-occurrence boosting of our unsupervised model using pre-trained multi-modal reuse.
Traditional event detection from video frames are based on a batch or offline based algorithms: it is assumed that a single event is present within each video, and videos are processed, typically via a pre-processing algorithm which requires enormous amounts of computation and takes lots of CPU time to complete the task. While this can be suitable for tasks which have specified training and testing phases where time is not critical, it is entirely unacceptable for some real-world applications which require a prompt, real-time event interpretation on time. With the recent success of using multiple models for learning features such as generative adversarial autoencoder (GANS), we propose a two-model approach for real-time detection. Like GANs which learns the generative model of the dataset and further optimizes by using the discriminator which learn per sample difference between generated images. The proposed architecture uses a pre-trained model with a large dataset which is used to boost weekly labeled instances in parallel with deep-layers for the small aerial targets with a fraction of the computation time for training and detection with high accuracy. We emphasize previous work on unsupervised learning due to overheads in training labeled data in the sensor domain.
A general nonparametric technique is proposed for the analysis of multi-resolution and multivariate feature space to isolate faulty sensors. The basic overlap function of the technique is an existing one-dimensional fault-detection Brooks-Iyengar algorithm which uses weighted precision and accuracy for static data. We prove the dual of the existing overlap function can isolate the measurement intervals in the multi-dimensional feature space for both labelled and unlabeled publicly available datasets. It is shown that computable complexity of learning the feature space increases linearly with the size of the input. The experimental results showed that by using mean average precision of all sensors using ensemble model for dynamic events. The proposed algorithm performed well in the presence of noise across many static and dynamic action recognition datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.