PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11430, including the Title Page, Copyright information, Table of Contents, Author and Conference Committee lists.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, heterogeneous features extraction is conducted by deep learning for drug-related webpages classification. First, body text and image-label text are extracted through HTML parsing, and effective images are chosen by the FOCARSS algorithm. Second, text-based BOW model is used to generate text representation, and image-based BOW model is used to generate images representation. Webpages representation is generated by concatenating representations of text and images. Heterogeneous feature extraction are conducted by deep learning and classical methods, such as PCA, respectively. Feature selection is also conducted using information theory. Last, extracted features and selected features are classified. Experimental results demonstrate that the classification accuracy of features extracted by deep learning is higher than those of features extracted or selected by classical methods, and also higher than the accuracy of single modal classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the vehicle target detection problem of multi-polarization SAR image under the terrain backgrounds, the
global CFAR detection by dual-polarized 16-bit data is proposed, which effectively reduces the influence of terrain
clutter on detection. In addition, through the analysis of terrain flatness and the screening of built-up areas, the target
detection area is reduced, and the interference of the complex terrain and the densely populated area to the target
detection is further reduced, and the reliability of the target detection is greatly improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a learning method for deblurring Gaussian blurred images blindly by exploiting edge cues via deep multi-scales generative adversarial network: DeepEdgeGAN. We proposed the edges of the blurred images to be incorporated with the blurred image as the input of the DeepEdgeGAN to provide a strong prior constraint for the restoration, which is beneficial to solve the problem that gradients of the restored images with GANs methods tend to be smooth and not clear enough. Further, we introduce the perceptual and edge as well as scale losses to train the DeepEdgeGAN. With the trained end-to-end model, we directly restore the latent sharp images from blurred images and avoiding the estimation of pixel-kernel. Qualitative and quantitative experiments demonstrate that the visual effect of the restored images significantly improves better.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As an excellent method for extracting distinctive invariant features from images, SIFT (scale-invariant feature transform) can effectively resist affine transformation such as translation and rotation of images, and theoretically has better resistance to illumination changes [1]. However, in practical applications the performance of SIFT is always affected by the contrast reduction caused by illumination changes. In this paper, the performance of SIFT under different contrasts is systematically analyzed and evaluated, and a reasonable explanation is given for the reason of SIFT performance change under different illumination conditions. And a SIFT fast matching method based on contrast compression is proposed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the problem of difficult target matching and low matching efficiency in binocular measurement, this paper proposes a real-time target feature matching algorithm based on Binocular Stereo Vision-absolute window error minimization (CAEW, Calculate the Absolute Error Window ) to improve the speed and accuracy of measurements. Firstly, the calibration of the camera is solved by using Zhang's calibration method, and the Bouguet algorithm is used for Binocular Stereo Vision of the final calibration data. Then, the AdaBoost iterative algorithm is used to train the target detector for target recognition. The CAEW algorithm is compared with the commonly used SURF (Speeded-Up Robust Feature) algorithm. The evaluation data of experimental results showed that the CAEW algorithm can achieve an evaluation of more than 90%. It is significantly improved compared with the SURF algorithm and meet the needs of binocular real-time target matching.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We use domestic and foreign meteorological satellite data to carry out the research of Operational Regional meteorology which can be used for optical imaging terminal guidances. Attacks on areas covered by clouds can be divided into the following two scenarios: 1. Clouds are medium-high clouds, because the cloud base height of this kind of cloud layer is relatively high, generally more than 2500 meters, it will not have much influence on the optical imaging terminal guidance; 2. With low cloud coverage but not completely covered, the cloud can be detected and segmented, avoiding the cloud to hit the target. We use machine learning algorithm training model to divide the cloud into multi-layer cloud and single layer cloud, and the classification accuracy reaches 82.1%. Then for single-layer clouds, there are two methods to estimate the cloud bottom height: 1. We can use the MODIS data of the Aqua meteorological satellite to identify clouds of different attributes for cloud height estimation. 2. The height of single layer clouds can be calculated directly by using the physical characteristics of clouds, the average calculation error is 16.5%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The accurately and efficiently extracting rural settlements from high resolution remote sensing image is of important significance for rural government management. Due to the complex environment in rural region, the traditional supervised classification methods already could not satisfy the application requirements for automatically extracting rural settlements, and they can only obtain the results of low precision and incomplete extraction. In recent years, with the rapid development of deep learning in computer vision, the deep learning method has been widely used to target extraction based on high resolution remote sensing imagery. So, this paper proposed a rural settlements extraction method based on the deep learning using high-resolution remote sensing image. The Tensorflow deep learning framework was built up to train the Faster regional recommendation convolutional neural network model(Faster R-CNN). Image feature maps were extracted by the Convolutional Neural Network(CNN) firstly. The region proposal network (RPN) was built to extract the regions that might contain rural settlements. And the region was identified and classified by detection network. The method was tested and verified in the homemade datasets. This paper selected a typical area for testing. The experimental results show that the proposed method can extract the rural settlements areas with higher accuracy compared with traditional rural extraction ways.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper is aimed at the type recognition of aircraft, with four kinds of typical military aircraft as research objects. In this paper, we establish a database on aircraft type and propose an effective and efficient method of type recognition called Geometric-Convolutional Neutral Networks(G-CNN) in a coarse-to-fine manner. We start with target characteristics for the first time and establish a target characteristics database by analyzing the acquired characteristics such as geometric characteristics and optical characteristics. Next, aiming at the problem that the dataset on aircraft types is few, we build 3D models based on the characteristics database and make an aircraft type dataset using 3D simulation creatively, which is of great significance for the research on aircraft type recognition. Finally, we extract the geometric characteristics of the aircraft—affine invariant moments and aspect ratios, realizing a fast and efficient region selecting; we improve residual blocks with dilated convolution, which is used for type recognition for the first time. Our method achieves 89.0%mAP and the experiments show that it tackles the type recognition problems with improved performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Extracting buildings from remote sensing images is a significant task with many applications such as map drawing, city planning, population estimation, etc. However, traditional methods that rely on artificially designed features struggle to perform well due to the diverse appearance and complicated background. In this paper, we design an end-to-end convolutional neural network that combines semantic segmentation and edge detection for building extraction. In addition, we propose a residual unit combined with spatial pyramid pooling (SPP-RU) to yield representations of different size receptive fields by multi-branch network. We conduct experiments on WHU building dataset, and the experimental results demonstrate the effectiveness of our method in terms of quantitative and qualitative performance compared with state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
SSD (Single Shot Multi-box Detector) is one of the best object detection algorithms with both high accuracy and fast speed. However, SSD’s feature pyramid detection method only extracts the features from different scales without further procession, which leads to semantic information lost. In this paper, we proposed Multi-scales Feature Integration SSD, an enhanced SSD with feature integrated modules which can improve the performance significantly over SSD. In the feature integrated modules, features from different layers with different scales are concatenated together after some upsampling tricks, then we use the features as input of several convolutional modules, those modules will be fed to multibox detectors to predict the final results. We test our algorithm On the Pascal VOC 2007test with the input size 300×300 using a single Nvidia 1080Ti GPU. In addition, our network outperforms a lot of state-of-the-art object detection algorithms in both aspects of accuracy and speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The attitude discrimination of objects in geosynchronous earth orbit (GEO) is vital for detailed understanding of the space objects population in Space Situational Awareness(SSA) domain. In this paper, a data-driven method is presented to discriminate the attitude of GEO space objects based on a deep learning approach. The convolutional neural networks(CNNs) is designed and trained to validate the ability to discriminate the attitude of GEO space objects from collected light-curve measurements. The temporal variation of in apparent object brightness across observations between the attitude stabilized and rotated space objects is exploited. Thousands of light-curves of attitude stabilized and rotated space objects are selected and transformed into the spectrum figures by the short-time Fourier transform (STFT). These spectrum figures are employed to train the deep CNNs and to evaluate the performance on the limited training set. Comparing with the traditional machine learning algorithms, the CNNs has a better performance on the attitude discrimination accuracy with the measured data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The pose of a non-cooperative target represented by point cloud can be estimated through point cloud registration, which is generally performed by searching good correspondences. Seeking correspondences in the context of non-cooperative target pose estimation is a challenging task due to the low texture, noise and occlusion, resulting in a number of outliers in the initial correspondences. In order to gain a high quality set of feature correspondences, we employ a combination of local and global constraints to remove the outliers in initial correspondences. On a local scale, we use simple and low-level geometric invariants. On a global scale, we apply covariant constraints for finding compatible correspondences. In the experiments, we use four groups of different non-cooperative targets to evaluate our algorithm and the results verify that the quality of the correspondence set has been greatly improved by our method and the pose can be accurately estimated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
State-of-the-art methods of keyword extraction from news are based on traditional machine learning and their performances rely heavily on hand-crafted feature and domain-specific knowledge. In this paper, we propose a new character-based method for keyword extraction from Chinese sport news, which based bidirectional Long Short-Term Memory with Conditional Random Field (BILSTM-CRF). The experiments result shows that BILSTM-CRF can effectively improve the performance of keyword extraction in Chinese sport news.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Accuracy of correlation filtering trackers have got great improvement because of using high dimension features, but its real-time performance became worsen. And we often have the meet of running tracker on embedding device, in this case, we need less calculation. It is all known that the model updating strategy is also important for tracking performance. The fixed learning rate model updating strategy is difficult to deal with the situation that the object changes rapidly or slowly. For the problem, a new correlation surface quality evaluation metric is proposed in this paper. Meanwhile, we consider the occlusion of the object, and propose the occlusion judgment algorithm. Finally, the learning rate of model is updated adaptively according to the change speed of the object and whether the object is occluded. We further conduct experiment on the OTB50 dataset. Experimental results show that the correlation tracker with gray feature can improve the tracking accuracy by about 3% compared with MOSSE tracker, after adopting the learning rate adaptive strategy proposed in this paper and maintain high speed on embedding device.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object detection based on deep learning algorithms has been an important yet challenging research field in computer vision. The feature pyramid network has become a dominant network architecture in many detection applications because of its powerful feature learning ability for objects with varying scales. To address the challenges in detecting small and densely packed objects, this paper proposes an innovative object detection approach by combining the path aggregation scheme and the feature pyramid network into a unified framework. Specifically, we add a bottom-up branch with lateral connection onto the existing feature pyramid network and apply adaptive feature fusion strategy, which improves the detection performance for small and densely arranged objects in remote sensing images. Experiment results show that our proposed path aggregated feature pyramid network can improve the detection performance for diverse objects in aerial images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of text categorization technology, there are still some problems, such as low classification efficiency, low accuracy and incomplete extraction of text features, in the case of large amount of data and too many categorized attributes. In this paper, a hybrid model of CNN (Convolutional Neural Network) and BiLSTM (Bidirectional Long-term and Short-term Memory Neural Network) combined with Attention (Attention Mechanism) is used to classify and process long text data. CNN extracts feature information from text, then uses BiLSTM to extract context semantics information, combines Attention to distribute weight of text information, and enters softmax classifier to classify. The experimental results show that the feature extraction of this model is more comprehensive, and the classification effect has been improved to a certain extent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to meet the requirements of 3D reconstruction in accuracy, reconstruction speed and algorithm applicability, this paper proposes a Delaunay growth algorithm based on point cloud curvature smoothing, which firstly projects a 3D discrete point cloud into a 2D plane and passes a 2D Delaunay triangulation. The two-dimensional Delaunay triangulation is performed by the empty circle criterion and the maximum and minimum angle criterion in the score. The PCA principal component analysis is used to estimate the normal of the three-dimensional point cloud and locate the normal on the same side to avoid the disordered points. The cloud normal, combined with the curvature of the corresponding 3D point cloud, removes the invalid normal in the point cloud due to invalid points and preserves the larger part of the point cloud as much as possible, and finally passes the Delaunay constraint criterion and the evaluation function. Filter the set of alternate points to ensure that the reconstructed triangle approximates the Delaunay triangle. The experimental results show that the reconstruction algorithm proposed in this paper is much better than the traditional greedy triangle projection algorithm and Poisson algorithm and the reconstruction speed is increased by 20%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Large-scale image categorization is a challenging task. In this paper, we propose a new image categorization approach based on visual saliency and bag-of-words model. Firstly, a saliency map is generated by visual saliency method that exploits some coarsely localized information, i.e. the salient region shape and contour. Secondly, size of salient region is acquired by calculating maximum entropy. Thirdly, the local image descriptor-SIFT extracted in the salient region and visual saliency information are combined to build visual words. Finally, the visual word bag is categorized by Support Vector Machine. By comparing with BOW model categorization methods, experiment results show that our methods can effectively improve the accuracy of image categorization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Machine vision is now widely used. The traditional Meanshift algorithm can easily cause the target to be lost due to its fixed core radius, which makes the target change in size and direction, thus affecting the whole tracking result. Aiming at these problems, this paper introduces the fuzzy control method, based on the adaptive fuzzy mechanism of the similarity function value to select the appropriate dynamic kernel radius in real time, thus improving the tracking effect. Through the comparison of the tracking accuracy before and after the improvement of the video stream of the basketball game, the average accuracy of the improved tracking can reach 73.76%, which is 9.28%, so the target tracking of the athletes on the sports field can be effectively realized.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Previous efforts on heterogeneous face recognition typically assume each subject has multiple training samples. However, this assumption may not hold in some special cases such as law-enforcement where only a Single Sample Per Person (SSPP) exists in the training set. For face recognition in SSPP scenario, it often suffers from overfitting and singular matrix problems. To solve this problem, we propose a novel learning-based algorithm called Coupled Discriminant Mapping (CDM) for heterogeneous face recognition. The CDM method finds a common space and learns a couple of discriminant projections for two different modalities without depending on the intra-class scatters. In the common space ,images of the same person are pulled into close proximity even if they come through different modalities meanwhile all the image under the same modality are pushed apart since each image belongs to a distinct class. The performance of CDM method is evaluated in two tasks: visual face image vs. near infrared face image and conventional face recognition. Experiments are conducted on two widely studied databases to show the effectiveness and consistence of the proposed CDM method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To solve the problem of real-time recognition of large quantities of ISAR data with multi-platform and multi equipment, especially the issue of adaptive feature extraction of ISAR data, an ISAR target recognition system based on artificial intelligence is proposed. The system consists of three arrangements: data layer, recognition analysis layer, and presentation layer. The data layer extracts ISAR data according to different application scenarios; the recognition and analysis layer introduces deep learning algorithm for adaptively feature extraction, model training and optimizing, comprehensive identification and evaluation results analyzing. The presentation layer establishes stable and efficient information service based on the Web service framework and realizes a full cross-platform display of recognition results and feature information. By practice, the system improves the identification speed significantly and achieves real-time recognition of ISAR data, information push, and display across multi-platform, to effectively assist users in decision making and evaluation judgment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The current work on person recognition in photo albums mainly utilize pure deep convolutional features to describe a person’s image. However, we observe that the hand-crafted features are usually able to provide complementary information and are more stable for identity recognition under some challenging circumstances. In view of this, we propose a novel hybrid method for person recognition in photo albums. In the proposed method, both the hand-crafted features and deep convolutional features are extracted from every person’s image. These multi-modality features are then fused by a weighted average method and classified by a pre-trained SVM in the recognition procedure. The experimental results demonstrates the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of machine learning, computer vision and other artificial intelligence technologies, vehicle identification based on image processing, pattern recognition and other technologies has attracted more and more attention and research. As an important part of intelligent transportation system, vehicle type identification plays an important role in traffic management, campus entrance guard and other scenarios. This paper proposes a HOG feature based vehicle model recognition algorithm for the recognition of road passing vehicles. First, HOG feature vector of vehicle samples is extracted through HOG algorithm. Then, SVM classifier is used to train the HOG feature vector of training samples. The HOG feature vector of test samples is put into SVM classifier to obtain the classification result of test samples. In this paper, according to the wheelbase and displacement classification, the vehicle types are divided into: micro car, small car, compact car, medium car, medium and large car, luxury car, MPV, SUV, minivan nine types and establish training samples and test sample model library, the overall recognition success rate is 93.6893%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
According to the characteristics of fish-eye camera, such as large field of view and super short focal length, the traditional camera calibration algorithm based on the small hole imaging model cannot achieve the calibration. This paper proposed a fish-eye camera calibration optimization based on the traditional Kannala model. Firstly, the camera imaging model and distortion type of the fish-eye camera are studied, and on the basis of the traditional Kannala model, the piecewise polynomial approximation model is established to realize the original model optimization. Then, the intrinsic parameters and distortion coefficients of the camera are obtained according to the traditional Kannala model and the optimization model,and the distortion correction images are obtained by intrinsic parameters and distortion coefficients. Finally, the advantages of this algorithm are quantitatively and qualitatively analyzed by using the re-projection error and the multiview stereo vision 3D reconstruction of the distorted correction image. The results indicate that the camera parameters and distortion coefficients were obtained by calibration to correct the original image and to carry out 3D reconstruction of multi-view stereo vision, and the reverse projection error analysis and 3D reconstruction visualization of the camera check are proved to be effective in the calibration of the optimized model camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single image dehazing is a challenging ill-posed restoration problem. Most of dehazing algorithms follow the classical atmospheric scattering model and adopt same parameters for different hazy density areas in hazy images. In this paper, we proposed an end-to-end dehazing algorithm, called Dehazing Network based on Haze Density(DNBHD). The proposed network involves a haze density map estimation network and a dehazing network. By the estimated haze density map, hazy image is divided into a mist region and a dense fog region which are respectively feed into dehazing network. Compared with previous dehazing algorithm, DNBHD is independent on the atmospheric scattering model, and considers uniform fog distribution in images. We use different parameters to handle different hazy density regions, avoiding color distortion and inappropriate brightness caused by overall defogging. The experiments show our algorithm achieves significant improvements over the state-of-the-art methods
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of technology, especially the rapid development of hand-held devices, it is more convenient to obtain video sequences, but the video quality still suffers from some issues, such as unwanted camera shakes and jitter. To address the issues, video stabilization techniques have been developed to obtain high quality and stable videos. Considering computational complexity and real-time requirements, patch matching, has become an important method for motion estimation and video stabilization. It transforms the video stabilization task into a minimum optimization problem. In this paper, we propose a novel patch matching method integrated with fireworks algorithm[1] for motion search, which is a novel swarm intelligence optimization algorithm. Inspired by the fireworks explode in the air, the established mathematical model can be formulated as a parallel explosive search method by introducing random factors and selection strategies, and thus developed into a global probability search method for solving the optimal solution of complex optimization problems. It has excellent performance and high efficiency in solving complex optimization problems. Experimental results show that the improved patch matching method based on fireworks algorithm has achieved better results, compared with the ones with traditional motion search algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate the degraded face recognition problem. At checkpoints, it is common that a passenger’s photo is digitally taken on the spot and compared with archived images scanned from printed photos. Therefore, the gallery set and the probe set come through two different media. The distortions introduced in the printing and the scanning processes often lead to unsatisfactory identification performance, necessitation further investigations in tackling degraded face recognition. Therefore, we propose an improved modality-invariant feature (IMIF) approach which combines the modality invariant features with a discriminative learning procedure to handle the variations in expression, occlusion and degradation. Experiments on the degraded face database show that the proposed IMIF enhances the degraded face recognition performance compared with other methods and validates the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The high-speed railway overhead contact system is a transmission line that is erected along the high-speed railway and supplies power to the electric locomotive. Once the overhead contact system is powered off, it will directly affect the safe operation of the locomotive, with serious consequences. Insulators are the key components for regular inspection of high-speed railway overhead contact system. The common faults of insulators are damage, dirt and discharge. There are many types of components in a single image, but their shapes vary from different components. These components should be divided into normal or multiple different fault types. Usually the difference between different fault types of the same component is small. Therefore, a hierarchical coarse-to-fine strategy is proposed to address this issue. Specifically, for a trade-off between efficiency and accuracy, an efficient network is leveraged to detect the insulator in the image, and an accurate network is then utilized to identify the fault.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, with the development of wireless sensor networks(WSN), it has been applied in more and more areas. However, anomaly detection has been always the hot topic in WSN. In order to solve the above problem, this paper proposes an anomaly detection algorithm which is based on the K-means clustering and BP neural network algorithms. This algorithm firstly employs the K-means clustering algorithm classify and mark the collected original sample data as anomaly and normal. Based on the above tagged data, it then uses the BP neural network algorithm train the classification model and realize the on-line detection of anomaly data. Finally, relevant experiments on virtual and actual sensor databases show that our algorithm can achieve a high outlier detection rate while the false alarm rate is low. In addition, because K-means clustering algorithm is an unsupervised classification method, our algorithm is suitable for different WSN applications scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a two-stage method for remote sensing image ship detection. The proposed approach efficiently detects ships in remote sensing images. Firstly, a light-weight classification network is used to classify different regions. In second stage, we design a detection framework to detect ships in sub-images, which are considered to contain object in the first stage. To solve the scale problems in object detection, our detection network is built on feature pyramid network, but we explicitly assign object into corresponding feature maps based on size. In our proposed framework, instead of using anchors, we predict object center point and the offsets to bounding box. The experiment results show that our proposed method has a good performance in terms of speed and accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The discrimination of interferences, especially artificial interferences, such as decoy, is crucial to improving the target’s detection performance. The differences in the kinematics characteristics between the target and the decoys are the main foundation to classify targets and kinds of decoys. The kinematics characteristics of the target and decoy usually represented by their behavior patterns. In this paper, learned from the human behavior recognition methods, a method for infrared target and decoy recognition based on the behavior recognition network was proposed. Our method combines detection network (Faster-RCNN), association algorithm (Deep-Sort), Inflated 3D convolutional network (I3D) with long-range attention block to perform target and decoy behavior recognition. Interactions with surrounding objects and other objects contain important information towards understanding the behavior. Improving the non-local attention mechanism by aggregating channel-wise attention and trajectory attention, our proposal method enables the I3D network to efficiently capture relation features on any positions, time and channels, especially trajectory behavior features of the target and decoy, that improve the discriminative ability of the anti-interference behavior recognition network. Experiments show our proposed method has a better performance than the original non-local attention network, achieve a state-of-the-art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Infrared images of various complex environments and targets can be realistically generated by computer simulation technology. The infrared radiation generated by simulation is affected by many factors in the atmospheric transmission process, and atmospheric turbulence can significantly reduce the imaging quality (including image distortion, jitter, uneven illumination and blur). The traditional way to simulate the influence of atmospheric turbulence on images need to consider a variety of influencing factors, and the process is cumbersome. By improving the generation of the anti-network, the pixel gray loss function term is increased to reduce the infrared image distortion. The convergence of the GAN network is improved by increasing the GAN loss function with gradient constraints. Experiments show that the network obtained by the above method is stable and the generated image quality is high. In this paper, the structural similarity (SSIM) between the clear image and the aero-optical effect image corrected with the conditional generation adversarial the network is72.07%. The structural similarity (SSIM) between the original aero-optical effect image and the clear image is 57.02%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the upgrade of the industry, robots urgently need to track moving targets at high speed. Therefore, the detection algorithms in machine vision technology need to be improved. Aiming at the problem that high and low thresholds need to be fixed in traditional Canny edge detection algorithm, an improved dynamic double threshold Canny algorithm is proposed. Constantly increasing the size of the threshold, Using the size of the area where the image edge is closed as a standard, finally to determine the best threshold, In order to achieve the best detection effect. Experimental results show that, Improved dynamic double threshold Canny algorithm not only improves the edge detection effect by 9% on average compared with the traditional algorithm, but also detects more complete image information and has stronger adaptability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a method of building a semantic segmentation method for high-resolution remote sensing images of conditional random fields. Through a large number of actual data operations comparison, U-Net semantic segmentation model is selected as the improved basic model in many deep convolutional neural network models. In order to improve the singularity of the upsampling operation, the U-Net semantic segmentation model is improved as follows: First, the model's crop-copy connection structure is changed to the pyramid pooling layer, and then the multi-scale representation feature image is used, and the multi-scale is used. The resampling of the feature image and the fine bilinear interpolation yield the maximum response at different scales. The improved U-Net model extracts more complete image features. The rough segmentation results are used as the initial input values of the fully connected conditional random fields (CRFs). The global pixel potential energy is inferred through the fully connected graph, and the feature images are refined. Target matching. Finally, the image features are input to the sigmoid classifier for analysis. The results show that the CRF-SUNet model with introduced conditional random field has high segmentation precision, and the boundary of the segmented building is clear, smooth and complete.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present an application solution for real-time augmented reality. We achieve almost 30 frames per second on our device and maintained good result for augmented. We use textured planar object as target. Consider of the computational complexity, we use patch feature and ZSSD template matching method for point matching. In the meantime, we maintain a database of the target template as semantics information. The semantics includes multiple key frame images of target object in different position. With this semantic database, we figure out the problems caused by viewport change and achieve robust performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
An image caption generation model with adaptive attention mechanism is proposed for dealing with the weakness of the image description model by the local image features. Under the framework of encoder and decoder architecture, the local and global features of images are extracted by using inception V3 and VGG19 network models at the encoder. Since the adaptive attention mechanism proposed in this paper can automatically identify and acquire the importance of local and global image information, the decoder can generate sentences describing the image more intuitively and accurately. The proposed model is trained and tested on Microsoft COCO dataset. The experimental results show that the proposed method can extract more abundant and complete information from the image and generate more accurate sentences, compared with the image caption model based on local features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recognizing the transmission tower numbers is an import part of the automatic inspection of high-voltage transmission lines. However, it's infeasible to accomplish this task effectively in one step giving the large scene images shot by unmanned aerial vehicles. In this paper, we present a cascaded framework consists of two CNN components: number plate detection and serial number recognition. The proposed method reduces the difficulty of localizing number characters in large scenes by leveraging the robust background, number plates. On the one hand, the proposed cascaded coarse-to-fine method reduces the missing rate and improves the detection accuracy, on the other hand, the recognition complexity is greatly reduced. The experimental results on our collected dataset demonstrate the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This work intends to deal with the problem of misalignment in image stitching caused by small overlap area. To reduce mismatches between matched features pairs in two connected images, random sample consensus (RANSAC) [1] is usually adopted, which works under the assumption that the sampling of matched feature points with the largest number of inliers should be utilized to compute geometric matrix. However, this assumption does not hold in the case of small overlap area between the connected images, as compressing or turning over the image may result in better spatial consistency of matched feature points. Therefore, we propose a turnover and shape filter based feature matching method for image stitching. In the method, a turnover and shape filter is firstly used to filter out the samplings resulted from turnover and compression, which is then connected to RANSAC to yield final inliers. Experimental results from real-world datasets validate the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to achieve high efficiency, automatic and accurate measurement, the paper takes the two-dimensional measurement of industrial glass under the experimental conditions. The main contents of this paper includes: Analyzing the structure and hardware performance parameters of the system, building a measuring platform including computer, Charge-coupled Device image sensor, lens, etc., using high-precision camera to take the image of glass, preprocessing of glass image data and acquiring edge information of glass. The system use second filtering method to filter the image and Canny operator to acquire the edge of the industry glass, transforming computer coordinate system into world coordinate system through coordinate transformation method, and finally calculate the two-dimensional size information of industrial glass. The system measures the two-dimensional length and width of polygonal glass, the experimental results show that the measurement method in this paper meet the accuracy requirements of general industrial measurement, and the detection system is feasible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In groundwater monitoring and management, the groundwater level data is usually analyzed and managed to judge the exploitation of groundwater, which requires the monitoring value to have higher accuracy. However, because the monitoring data is susceptible to many factors such as sensor failure and abnormal signal transmission, the error in the abnormal situation is judged, causing the system to falsely report the leak alarm. Therefore, the calculation model of multi-parameter correlation degree can be established through the analysis of the fluctuation cross-correlation between multiple parameters to improve the efficiency and accuracy of abnormal event analysis. According to the research and analysis, the change trend of groundwater level in the monitoring area is basically consistent with the change trend of rainfall. Pearson correlation analysis method can be used to analyze the correlation between groundwater level change and rainfall to improve the accuracy of groundwater level anomaly detection. When the groundwater level change is intuitively far from other data, and the closeness to rainfall is lower than the set correlation threshold, it can be judged as abnormal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As computer performance continues to improve, new deep learning algorithms emerge in an endless stream. Object recognition (target detection) is one of the influential research directions in the field of computer vision. The traditional object recognition method has the following problems: 1. The generation of the target suggestion frame has a cumbersome effect on the detection speed, accuracy and redundancy; 2. Artificially extracting the image features cannot guarantee the quality of the feature; 3. Using the traditional machine learning method for feature classification is low; 4.slow detection speed and low accuracy. In this paper, the DenseNet structure is used to improve the recognition accuracy, and the SSD is used to improve the detection speed. At the same time, with the classification detection technology and the jump connection technology used in the network, the experiment shows that the target detection efficiency is further improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Oracle bone inscriptions (OBIs) are invaluable materials for recovering the economic and social forms for Shang Dynasty, one of the most ancient dynasties in China. It is very important to get the original OBIs from scanned images of oracle bone rubbings. To this end, researchers have to employ a very time-consuming method that they follow the inscriptions by handwritten tools, pixel by pixel and image by image. In this paper, an image segmentation method was proposed to overcome this limitation based on fully convolutional networks (FCN). In order to speed up training as well as boost the segmentation performance, a simple FCN with only convolutional layers was designed, where batch normalization was incorporated. The proposed method was tested on a real OBI image set (320 samples). Experimental results show that the proposed method is effective enough to get the OBIs from scanned images of oracle bone rubbings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to enrich the effective feature information of fingerprint template and improve the matching performance of local fingerprint identification system, this paper proposes a multi-template partial fingerprint recognition strategy based on cross mosaicing. In the registration phase, the template feature cross-splicing process is performed on the fingerprint feature template extracted from the local fingerprint image to enrich the effective feature information of the fingerprint template, thereby avoiding the occurrence of mosaicing failure due to different mosaicing sequences. In view of the problem that the fingerprint image recombination rate is too low and will cause the fail of registration, this paper base on multiple storage templates, and continuously enriches the effective information contained in the feature template through the template update strategy in the authentication phase. The experimental results have shown that the strategy of this paper has better matching performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional studies on micro-expression feature extraction primarily focused on global face from all frames. To improve the efficiency of feature extraction, this paper proposes a new framework based on the local region and the key frame to represent facial micro-expressions. Firstly, the face feature point detection technique is used to acquire the coordinates of the 68 key points, and the region of interest is divided by those key point coordinates and the action unit. Secondly, in order to remove redundant information in the micro-expression video sequence, structural similarity index (SSIM) is used to select key frames for each local region of interest. Finally, the dual-cross patterns (DCP) are extracted for the local regions of interest and are concatenated into a feature vector for the final classification. The experimental results show that compared with the traditional micro-expression method, the proposed method has higher recognition rate and achieves better time computation performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, with the rapid development of artificial intelligence technology, human auditory intelligence perception has received extensive attention. The human-like auditory intelligent speech separation of robots in complex acoustic environment is studied. Through in-depth learning of key technologies such as DNN-HMM, a new deep network cluster structure, optimization objectives and deep learning algorithm capable of denoising in complex frequency domain are proposed to improve the accuracy of speech recognition, solve the problem of speech separation in human-like hearing in harsh environments, realize high-quality auditory perception in real environments, and enhance intelligence in far-field and complex acoustic environments. Human-computer interaction performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we make use of a convolutional autoencoder to predict multiple unseen views of an object in the infrared domain. The dataset we use for this purpose is called ‘DSIAC-ATR Image database’ which has never been used before for view prediction in the non-linear feature subspace. Our method involves exploiting the underlying feature subspace – the manifold of the object - to predict an unseen view. We address a more challenging task of view prediction by working with greyscale images- the infrared images collected both during the day and night. We propose multiple architectures that not only predict how an object (a military vehicle in this case) will look like at a certain orientation but also learn to predict day or night infrared image and produce either as asked. We train our networks and show via experiments that the weights do not learn the geometry of transformation in the Euclidean space but rather in the Riemannian space. We explore the underlying feature subspace and observe that the networks learn the manifolds and thereby produce sharp, distinct and natural-looking images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We introduce an adaptive threshold instance segmentation network in point cloud based on similarity group proposal network(SGPN), named adaptive threshold similarity group proposal network(ATSGPN). SGPN learns the feature of point cloud to process similarity matrix and clusters. In our experiments, we find that we cannot always get the proper threshold by heuristic method to divide the points although the similarity matrix is good enough. Based on this idea, we introduce the Threshold Map to learn segmentation threshold. We also improve the feature extraction using edge convolution(EdgeConv). The point cloud first passes EdgeConv to extract features and learns the similarity matrix in feature space. The semantic label of each point and the segmentation threshold can help to generate groups and then calculates confidence to evaluate the group quality and backpropagation. ATSGPN has higher accuracy on Stanford Large- Scale 3D Indoor Spaces Dataset (S3SID) and fewer steps than SGPN, and there are some experiments can be shown in the paper for its good performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently, the correlation filter is widely used in visual tracking because of its effectiveness and efficiency. To adapt the representation to changing target appearances, a linear interpolation is used to update tracking models according to a manually designed learning rate. However, The limitation of manually tricks make methods only apply to some special scenes because the threshold parameters are sensitive to different response maps in complex scenes. In this paper, to overcome this problem, an adaptive increment correlation filter based tracker is proposed. Different from traditional linear interpolation depending on a manual learning rate, the increment is learned by linear regression based on the history tracking model and the current training samples. Experimentally, we show that our algorithm can outperform state-of-the-art key point-based trackers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a novel approach for face morphe. We use a recent domain transfer technique to generate target expression and combine a robust image deformation technique to obtain high realistic facial morphe. We addressed the facial morphe problem through three steps. Firstly, Domain transfer technique is introduced to transfer an image from one domain to another. Secondly, use a face alignment algorithm to locate accurate facial landmark points for both domain transferred face and target face, then align them with a global similarity transformation to eliminate their inconsistency in pose, size and position. Then we employ the FM2RLS method to deform the domain transferred face into the target image, let the images pairs align in pixel level. To validate the effectiveness of the proposed approach, extensive experiments on real images shows accurate results of our method, which is superior to the current state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Faster R-CNN is a general-purpose detection algorithm that performs well in most cases. However, Faster R-CNN performs poorly on detecting small-scale UAVs. In order to improve the detection performance for small-scale UAVs, a new anchor strategy (TLCS-Anchor) which could be adopted by Faster R-CNN is proposed in this paper. Firstly, the anchor templates are designed to be suitable for the UAV dataset by using the clustering method so that the aspect ratios and scales for anchors are more targeted to UAVs. Then, a new compensation strategy of anchors is proposed to help detect small-scale UAVs in this paper, which could not only improve the number of anchors matched with the UAVs, but also alleviate the problem that small-scale UAVs can’t match with enough anchors to some extent. Experimental results show that TLCS-Anchor can help improve the detection performance for UAVs, especially for small-scale UAVs. In theory, TLCS-Anchor can also be used to detect other small-scale targets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Infrared images can distinguish targets from their backgrounds on the basis of difference in thermal radiation, which works well at all day/night time and under all weather conditions. By contrast, visible images can provide texture details with high spatial resolution and definition in a manner consistent with the human visual system. We addressed the multimodality image fusion problem through three steps. Firstly, Domain transfer technique is introduced to transfer an image from one domain to another. For example, from visible image to infrared image. It can capture content characteristics of one image collection and figure out how these characteristics could be translated into the other image collection, all in the absence of any paired training examples. Secondly, we employ the nonrigid transformation method to match the domain transferred image and the target image, let the images pairs align in pixel level. Then we focus on fusion the domain transferred and spatial transformed image with the target image. Through translation and transformation, we simplify the fusion problem into a simple combination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For the problem that VGG network cannot use the special information of feature maps, this paper proposes a new algorithm that constructs the sum pooling feature based on the feature map extracted by the convolutional neural network. And this algorithm retains the construction of original feature maps so that special information on the original feature map could be used more reasonably. And then, this paper uses DOTA datasets to verify the proposed method. The results show that compared with the VGG-16 network, the proposed SPFC algorithm improves the accuracy in rough aircraft classification and the fighter subdivision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Crop extraction from the images captured in the field is a complex task. In this paper, a new crop segmentation method is presented based on a designed lightweight neural network which only has 5-layer. In the proposed method, the lightweight neural network is designed and constructed to deal with the crop color features in the normalized RGB and CIE L*a*b* color spaces to realized the accurate segmentation of crop images. To verify the performance of the proposed method, 120 rice images are utilized to compare the proposed method with four other famous approaches. Experiment demonstrates that our method is robust to the illumination variations in the field and performed better than other approaches. Experiment shows our method can be used to the task of crop segmentation accurately and efficiently.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the application of image more and more widely, People put forward higher requirements on the image quality of small objects and details in the image. In recent years, with the development of deep learning, it achieved good results in the research of image super-resolution. In this paper, we proposed EDSRGAN, a single image super-resolution(SISR) algorithm, based on enhanced residual network and the adversarial network. Compared with SRGAN, which is also based on the adversarial network, EDSRGAN can greatly reduce the high-frequency noise contained in the super-resolution(SR) image, and it also leads SRGAN in terms of peak signal to noise ratio and structural similarity evaluation indicators. Although EDSRGAN lagged behind EDSR in terms of peak signal to noise ratio and structural similarity, the SR images generated by EDSRGAN were sharper than EDSR in the object edges and targets details. EDSRGAN could achieve good results in image super-resolution on small targets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, most of the full reference laser disturbing image quality assessment methods need to know the position information of the disturbing spot and the target in advance, so that the assessment process is restricted by the prior knowledge and the preprocessing method. Aiming at this problem, this paper proposes a laser disturbing image quality assessment method based on convolution feature similarity (CNNSIM), which analyzes the output features of the image before and after laser disturbing in the convolution network. The occlusion degree of key information in the disturbing image is assessed by using the hierarchy and the sensitivity to occlusion of features, thus avoiding the input requirement of target/spot location information. The simulation experiment verifies the effectiveness of the new assessment method in different scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study we propose a line segment detector that generates accurate results. The proposed algorithm, which is linear-time for the number of edge pixels, provides highly accurate result and does not break off at cross points. The proposed algorithm starts from a randomly selected pixel and uses the improved least-square fitting method. This improved method is designed to process incremental data in linear-time. The proposed algorithm is highly suitable for the vision measurement and camera calibration applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vehicle part recognition aims to determine the subcategories of each vehicle part. Existing algorithms consider to recognize each category as independent classification tasks, which ignore the potential co-occurrence relationship between vehicle parts. In addition, it remains challenges to obtain satisfactory results due to the small intra- class difference. In this paper, we propose a part-pair recognition method based on deep learning by utilizing the co-occurrence relationship. Specifically, we construct a deep neural network for vehicle part recognition, which can use the co-occurrence relationship and recognize two vehicle part simultaneously. We also propose a massive dataset of vehicle parts with fully annotated labels for training and testing. Extensive experimental results demonstrate that the proposed method performs favorably against the state-of-the-art vehicle recognition algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual tracking plays an important role in computer vision research in these years. Multi-kernel correlation filter has demonstrated its outstanding advantage via introducing high level representation from multi-kernel. However, the unskillful selection of multi-kernel inevitably brings redundancy and noise within learning and updating procedure, which significantly affects the accuracy of tracking. A large margin multi-kernel tensor correlation filter for visual tracking (LMKCF) is proposed in this paper. The LMKCF mainly mitigates the redundancy and noise of multi-kernel correlation filter in learning and updating from two aspects with the low rank tensor learning to establishes a prospective learning and updating strategy. And the optimization problem can be solved effectively by the alternating direction method of multipliers (ADMM) method. Last, we validate the proposed tracker with the multi-kernel representations based on OTB benchmark to demonstrate the superiority of the method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aircraft is a kind of valuable military equipment and transportation, so using target detection technology to detect ground aircraft in the optical remote sensing image has important research and application value. Although some achievements have been made in the relevant research, how to realize fast and effective ground aircraft target detection is still a challenging task because of the complex background of remote sensing image, large scale change and small imaging size, etc. Aiming at the application scenarios of multi-frame imaging, such as embedded detection and tracking system, this thesis proposes an aircraft target detection scheme based on hierarchical screening, which can improve the detection speed and reduce false alarm. Firstly, by analyzing the background characteristics, a target candidate region extraction method based on gray variance is adopted, and the acceleration is realized by integrating graph and shared computation. Then, the haar-like features are extracted in the candidate regions, which are then classified by the cascade AdaBoost classifier. Afterwards, a union-find-sets algorithm is used to merge the redundancy detection results and evaluate the confidence. Finally, the inter-frame correlation information is used to remove the false alarm. And we carried out experimental verification and proved the effectiveness of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous updating of communication network technology and the influence of different factors (such as humidity, specific gravity, temperature, etc.), the monitoring data acquired by the grid equipment is exponentially increasing and the complexity of the data is also continuously improving. Taking full advantages of these big data, studying the measurement characteristics of electronic transformers in operation and discovering the relationship of environment, load and other factors will help optimize the performance of electronic transformers, give users a better experience and improve the benefits of the companies. However, the emergence of massive data makes traditional data analysis methods unable to meet the accuracy and real-time performance of data processing. Therefore, how to effectively and accurately solve the big data analysis and processing problems is particularly urgent. To effectively process this data, we have chosen the popular data mining method. Compared to traditional machine learning, we choose a relatively simple deep learning network for data mining. A feed forward neural network is used for classification. On the basis of classification, a new network is established to perform nonlinear regression prediction on the data, then an error transfer model is established. In the regression prediction problem, due to the high dimensionality and high computational complexity of the original data, we use the PCA method to reduce the feature dimension, which is also helpful to establish a nonlinear relationship between the learning characteristics of the deep neural network and the predicted values. Compared with the traditional feed forward neural network, the accuracy of our network has been significantly improved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nowadays, more and more printed books are accompanied by electronic resources including videos, audios, games, augmented reality and other mobile apps. However, it is not very convenient to access most of these electronic resources, as the association between printed books and electronic resources is not automatically available [2]. To build a bridge between a book page and the corresponding electronic resources, a large-scale book page retrieval method using deep hashing network is presented in this paper. There are mainly three contributions: First, a pipeline is proposed to make a Convolutional Neural Network (CNN) trained for another unrelated task available for book page retrieval. Second, the high-dimensional features extracted from the CNN is mapped to the low-dimensional binary hash code sequence in Hemming space by the deep hashing network, which not only increases the speed of retrieval but also saves the space of feature storage. Third, a large-scale dataset which is consist of 1.55M book page images is collected. Experimental results on the 1.55M book page dataset show that the proposed deep hashing network achieves a Top-1 hit rate of 92.1% and the response time is less than 0.6 second on a desktop computer with a GeForce 1080Ti GPU.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose Constrained Convolutional Neural Network, a novel approach to estimate the direction of numerous target objects. Considering adding a constrained layer at the output of existing object detection networks, by which CCNN performs better in both accuracy and speed than previous neural networks as it works with filtered data, and obtains a more precise result. In object direction estimation, by means of constraint structures, forward and backward propagation algorithms redesigned for the quaternions which describe the 3D pose of the object, CCNN can be further applied to 3D pose estimation. Experiments show that CCNN is feasible for object direction detection and 3D pose estimation, and outperforms conventional neural networks without unitized constrained layer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
At present, there are three main implementations of Nonparametrics Bayes model in machine learning: Dirichlet Process and CRP model, Beta Process and Beta Bernouilli Process model, Gamma Process and Gamma Poisson Process model. Aiming at the infinite sampling process constructed by Gamma Process Stick Breaking proposed by Anirban Roychowdhury, this paper discusses the problem of exact inference based on a finite number of observation samples, analyzes the exact probability distribution function of Gamma Process Stick Breaking construction, and takes this distribution function as a priori, and applies the corresponding results to the inference of Gamma Poisson process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.