PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11433 including the Title Page, Copyright information, Table of Contents, Introduction, and Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Bag of Visual Words Model (BoVW) has achieved impressive performance on human activity recognition. However, it is extremely difficult to capture high-level semantic meanings behind video features with this method as the spatiotemporal distribution of visual words is ignored, preventing localization of the interactions within a video. In this paper, we propose a supervised learning framework that automatically recognizes high-level human interaction based on a bag of spatiotemporal visual features. At first, a representative baseline keyframe that captures the major body parts of the interacting persons is selected and the bounding boxes containing persons are extracted to parse the poses of all persons in the interaction. Based on this keyframe, features are detected by combining edge features and Maximally Stable Extremal Regions (MSER) features for each interacting person and backward-forward tracked over the entire video sequence. Based on feature tracks, 3D XYT spatiotemporal volumes are generated for each interacting target. Then, the K-means algorithm is used to build a codebook of visual features to represent a given interaction. The interaction is then represented by the sum of the frequency occurrence of visual words between persons. Extensive experimental evaluations on the UT-interaction dataset demonstrate the strength of our method to recognize the ongoing interactions from videos with a simple implementation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The increasing number of mobile and wearable devices is dramatically changing the way we collect data about person’s life. These devices allow recording our daily activities and behavior in several forms, e.g., text, images, bio-signals, or video. However, many times, the collected data includes low quality or irrelevant contents, feeding lifelogging applications with huge amounts of data, and creating computational challenges for patterns’ identification. In this paper, we propose a fast image analysis approach to automatically select relevant images from lifelog data. Using images intrinsic information, such as scenes and objects, we have manually curated two datasets, one with relevant content and another with non-relevant information. Then, we applied supervised learning algorithms based on low-level image features, namely blur and focus, to find the binary model that best discriminates between the two classes. The binary models were then compared based on learning curves and f1-scores, achieving a 95.4% of f1-score for the best one. By reducing the amount of images in the lifelog data, we were able to save computational time without losing images with relevant content.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In videos, the waves, floating objects on the sea, peaks, and other objects passing by the ships may cause the shielding of the interest objects, and the ships are often disturbed by the same color background, which will easily lead to tracking failure. This paper presents a ship tracking algorithm based on deep learning and multi-feature, the algorithm utilizes an improved YOLO and multi-feature ship detection method to detect the ships, establishes the correlation of the same ships among different frames by the improved SIFT matching algorithm to realize ship tracking. The improved YOLO and multi-feature ship detection algorithm is proposed, YOLO method is optimized, and the optimization method is combined with HOG and LBP features, which is beneficial to solve the problems of easy omission and inaccurate positioning of YOLO network detection. SIFT matching algorithm is improved to solve the problems of lower accuracy and too long time for traditional SIFT matching algorithm, the SIFT features are reduced by MDS(multi-dimensional scaling), RANSAC(random sample consensus) is used to optimize SIFT feature matching and effectively eliminate mismatching. The experiment results show the tracking algorithm has higher accuracy, stronger robustness and better real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Two dimensional bar codes are widely used in our life: retail shopping, e-ticketing, advertisement. One of the most widespread symbology (type) is Aztec Code. To read a message from Aztec Code, it first of all must be localized in the input image. To simplify this task, the bar code specification introduces a special part called Core Symbol. In current work, a topological localization method for this part is presented. It relies on a connected components extraction, contour signature analysis and uses lines estimated by the fast Hough transform. It is invariant to scaling and rotation transformations of bar code images and is able to deal with partially corrupted Core Symbols. The technique for method quality measurement is provided. As a base line we consider an algorithm with quality equal to 94.61% which follows the ISO recommendations. The obtained result for the proposed algorithm is 99.03%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we explore the impact of geometrical restrictions in RANSAC sampling on the ID document type recognition accuracy in images, as well as on the accuracy of the projective distortion parameters estimation. The studied method is based on representing images as constellations of keypoints and their descriptors. The distortion parameters are estimated by applying RANSAC on the matched keypoints. Cases are studied where the base algorithm can yield erroneous or insufficiently accurate solution. A RANSAC scheme is presented with geometrical restrictors and several restriction are proposed, limiting the samples and the computed transform parameters. An experiment was conducted on the open dataset MIDV-500 and the data is presented of the dependence of classification and localization accuracy on the considered restrictors. It was shown that the introduction of restrictors allows to achieve a accuracy improvement and significant speed up.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chebyshev multifractal signatures for characterization of a multifractal nature of image textures for natural objects are proposed. These signatures can be obtained by generalized multifractal formalism (GMF) using Chebyshev polynomial (CP) kernels and can be considered as alternatives to traditional multifractal spectra. The paper also presents properties of introduced multifractal signatures: in particular, it is shown that Chebyshev multifractal signatures, similar to traditional multifractal spectra, are invariant to image scaling. To illustrate recognition possibilities of multifractal signatures, the application of the signatures for multifractal interpretation of synthetic-aperture radar (SAR) images of ice-covered sea areas are shown. It is established that using parameters of multifractal signature approximations calculated by Sentinel-1 SAR image regions, we can separate sea areas with very close ice, close ice and very open ice. The obtained results allow to say that it is possible to use the introduced multifractal signatures at the preliminary stage of object-oriented classification of SAR or other images to assess a textural separability of image objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years, research in the field of keyframe extraction become more attractive due to its use in advanced applications like video surveillance. In this paper, we introduce a novel algorithm of keyframe extraction which utilizes Binary Robust Invariant Scalable Keypoint features to obtain the dissimilarity level of consecutive frames and establishes shot transition boundary, from where we extract keyframes. The frame at which dissimilarity level is high is taken as a keyframe. The proposed algorithm is tested on ten different videos of animation category. Performance of the method is assessed using the evaluation metrics- Figure of merit, Detection percentage, Accuracy and missing factor. The experimental results and analysis shows improved performance of the proposed algorithm over the other state-ofthe-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The intraspecific nest parasitism is a phenomenon that attracts the attention of biologists. There are bird species like the Slender-billed which contains at most 3 eggs, but their nests can contain four or five eggs. In fact, a genetic study made on a set of nests has shown that one or two of the eggs belong to a second female named by biologists “a parasitic egg”. As the Gull Mockers are protected by the Law, researchers found it difficult to identify parasite eggs without genetic test. Many studies have been done in order to identify the parasitic egg, based on the morphological parameters and the characteristic of the egg’s shell, but these studies haven’t led to good results. Recent Advances in Artificial Intelligence (AI) and particularly Deep Learning (DL) techniques has increased motivation to use this method to quantify parasitic eggs. In this work, we present a new method to quantify a parasitic egg from a dataset of egg’s image. One of the most used techniques is Convolutional Neural Network (CNN). The technique is a supervised learning method used to classify images. We used this technique to extract features from image to characterize any egg. To evaluate our approach, we use 31 lays of eggs form the 92 eggs dataset to test the performance of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for detecting receipt fraud by implementing an Object Character Recognition (OCR) algorithm composed of Image Processing Techniques and Convolutional Neural Networks (CNNs). We implemented two CNN models into a smartphone application that gives customers the option to take pictures of products they intend to buy (also to crop their price tags) while present in a hypermarket/supermarket as well as of the paid receipt and succeeds to automatically identify and compare all prices (multiple digits including decimals) of the products seen at the shelf and all prices found in the paid receipt, received from the cashier. This application helps the customer detect a receipt fraud due to a computer or human error, in a cheap and convenient way. Experimental results show 99.96% overall test accuracy for the CNN responsible for identifying product prices and 99.35% overall test accuracy for the CNN responsible for identifying receipt prices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we introduce a new technique for nonlinear monitoring process relying on kernel entropy principal component analysis (KEPCA). KEPCA can transform input data into high-dimensional feature space using the nonlinear kernel function and determine the number of principal components (PCs) based on the computation of the entropy. The retained PCs are the ones that explain the maximum entropy of data in the feature space. Then, we introduce a new approach to calculate the upper control limits (UCLs) for the squared prediction error (SPE) and the T2 Hotelling in the feature space based on the density estimation via the k-nearest neighbors (kNN) estimator. The abovementioned approaches were applied to fault detection for the benchmark Tennessee Eastman process (TE). Results were robust and supply better performance than KPCA.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The abrupt movements, accidents, aging and obesity cause different kinds of knee dysfunctions. Therefore, the automatic detection of knee can serve a great purpose in planning related surgeries. The biggest challenge in medical imaging is to get the large number of images along with annotations which are needed for the successful working of convolutional neural networks (CNNs). Sometimes, the contrast of X-ray images is also very poor; the edges of targets are not clear in some radiographs and it becomes difficult for humans to locate the desired area in the images. This work introduces enhanced single shot detection (SSD) to tackle the automatic knee detection and localization problem. Image sharpening is used for pre-processing to handle the poor contrast issue. The dataset used to verify the proposed method is collected from the openly available online sources and the proposed approach has achieved 96.76% mAP.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we consider the problem of the fluorescent security fibers detection on the images of identity documents captured under ultraviolet light. As an example we use images of the second and third pages of the Russian passport and show features that render known methods and approaches based on image binarization non applicable. We propose a solution based on ridge detection in the gray-scale image of the document with preliminary normalized background. The algorithm was tested on a private dataset consisting of both authentic and model passports. Abandonment of binarization allowed to provide reliable and stable functioning of the proposed detector on a target dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a new loss function for the validation of image landmarks detected via Convolutional Neural Networks (CNN). The network learns to estimate how accurate its landmark estimation is. This loss function is applicable to all regression-based location estimations and allows the exclusion of unreliable landmarks from further processing. In addition, we formulate a novel batch balancing approach which weights the importance of samples based on their produced loss. This is done by computing a probability distribution mapping on an interval from which samples can be selected using a uniform random selection scheme. We conducted experiments on the 300W, AFLW, and WFLW facial landmark datasets. In the first experiments, the influence of our batch balancing approach is evaluated by comparing it against uniform sampling. In addition, we evaluated the impact of the validation loss on the landmark accuracy based on uniform sampling. The last experiments evaluate the correlation of the validation signal with the landmark accuracy. All experiments were performed for all three datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Non-maximum suppression (NMS) is widely used in object detectors for removing imprecise detection boxes. However, NMS can easily discard a part of correct detection boxes when multiple objects are overlapped. To deal with this problem, some methods had been presented, but only for simple overlapping scenes. Therefore, this paper proposes an improved NMS approach to detect objects with high degree of overlap. This method divides all of detection boxes into different clusters to reduce the degree of overlap between boxes. These detection box scores in each cluster are decayed as a function of overlap and no boxes are discarded. The improved NMS is combined with two commonly used object detection networks, namely Faster Region-based Convolutional Neural Networks and Region-based Fully Convolutional Networks. A complex public dataset Microsoft Common Objects in Context is employed to evaluate the performance of the improved NMS. Experimental results show that two metrics average recall and localization performance are improved by the proposed method for these two famous detectors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Based on PointPillars[1] and SECOND[2] we propose a novel LiDAR object detection system focused on speed, while not neglecting detection performance with the help of included occupancy grid features. We achieve this by replacing the voxelgrid defined in cartesian coordinates as introduced by Zhou et al. in VoxelNet[3] by a grid in a polar coordinate system. Doing this we can significantly reduce the number of required grid cells, while still keeping a good grid resolution in areas of highest point density close to the sensor. Because of this strong reduction on resolution we have a loss in performance which we try to regain with extending the feature network by adding occupancy grid and height map features. Furthermore we integrate the ground truth augmentation introduced in SECOND[2], injecting additional ground truth objects into the limited number of point clouds to increase variance. We achieve performance close to the state of the art, while reaching inference speeds around 12ms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Falls are one of the major causes of injury and death among elderly globally. The increase in the ageing population has also increased the possibility of re-occurrence of falls. This has further added social and economic burden due to the higher demand for the caretaker and costly treatments. Detecting fall accurately, therefore, can save lives as well as reduce the higher cost by reducing the false alarm. However, recognising falls are challenging as they involve pose translation at a greater speed. Certain activities such as abruptly sitting down, stumble and lying on a sofa demonstrate strong similarities in action with a fall event. Hence accuracy in fall detection is highly desirable. This paper presents a Long Short-Term Memory (LSTM) based fall detection using location features from the group of available joints in the human body. The result from the confusion matrix suggests that our proposed model can detect fall class with a precision of 1.0 which is highly desirable.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present a single-sample augmentation framework. The key idea of the framework consists of synthesizing a positive training set from a single natural sample using relevant geometric and pixel intensity transforms. The efficiency of the proposed framework has been demonstrated solving round seal stamp detection problem using Viola-Jones approach on the public “SPODS” dataset. The mentioned image transformations make it possible to simulate different orientation of the stamps, color differences, and distortions caused by stamping process and document aging. The proposed framework can be applied to training various machine learning algorithms for solving computer vision and computed tomography problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The majority of document image analysis systems use a document skew detection algorithm to simplify all its further processing stages. A huge amount of such algorithms based on Hough transform (HT) analysis has already been proposed. Despite this, we managed to find only one work where the Fast Hough Transform (FHT) usage was suggested to solve the indicated problem. Unfortunately, no study of that method was provided. In this work, we propose and study a skew detection algorithm for the document images which relies on FHT analysis. To measure this algorithm quality we use the dataset from the problem oriented DISEC‘13 contest and its evaluation methodology. Obtained values for AED, T OP80, and CE criteria are equal to 0.086, 0.056, 68.80 respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we consider a method for detecting end-to-end curves of limited curvature like the k-link polylines with bending angle between adjacent segments in a given range. The approximation accuracy is achieved by maximization of the quality function in the image matrix. The method is based on a dynamic programming scheme constructed over Fast Hough Transform calculation results for image bands. The proposed method asymptotic complexity is O(h⋅(w+h/k)⋅log(h/k)), where h and w are the image size, and k is the approximating polyline links number, which is an analogue of the complexity of the fast Fourier transform or the fast Hough transform. We also show the results of the proposed method on synthetic and real data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of social economy, the acceleration of urbanization process and the acceleration of an aging society, the phenomenon of pet ownership is becoming more common, pet medical services and related products are gradually showing the development trend of personification, standardization and scale. However, there are still many problems in the interface design of veterinary X-ray machine software, such as operation interface layout, color matching and information architecture. Based on this, the study takes vetview console interface as the research object, through the eye tracking and comparing the data obtained from a questionnaire survey, and it is concluded that the software usability problems and combined the user requirement analysis, software function analysis, as well as interactive prototype design method. It puts forward the basic requirements of the software interface design. This paper tries to put forward the design principles and methods which have practical guiding significance for this kind of interface design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of collaborative robots, robot programming by demonstration (PBD) plays an important role in human-robot interaction, it aims to transfer new skills from observations of tasks demonstrated by humans to robots. In this paper, we proposed a new approach to teach a robot to draw pictures based on human fingertip recognition and hand motion tracking. Combining the robot operating system (ROS), OpenCV and Moveit (motion planning libraries), we capture the finger movement trajectory by using Kinect2 depth camera. Then the trajectory waypoint is sent to the ur5 robotic arm through topic communication to complete the trajectory tracking task. The experiment indicates that the proposed approach allows inexperienced users to efficiently teach a robot to track the demonstrated trajectory.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recommender systems are becoming an intrinsic part of our lives. Currently, more and more people are using recommender systems to receive product or service recommendations. This became possible with the increasing power of mobile devices, the widespread use of the Internet and the accumulation of data about user activity. The selection of a suitable machine learning algorithm for a recommender system is a difficult task due to a large number of algorithms described in the literature. This task is even more complicated for specific systems, such as a recommender system for travel by public transport due to the small number of studies in this area. The objective of this paper is to evaluate machine learning algorithms to determine user-preferred stops of public transport in a personalized recommender system. In this paper, we examine some of the most well-known approaches such as support vector machine, the decision tree, random forest, adaboost, k-nearest neighbors algorithm, multi-layer perceptron classifier and approach based on the estimation algorithm proposed by Yu.I. Zhuravlev. In addition to accuracy, machine learning algorithms have been rated for performance. We also presented a possible visualization option on the map of user-preferred stops. The experiments were conducted on real data from the mobile application “Pribyvalka-63”. The mobile application is a part of the tosamara.ru service, currently used to inform Samara residents about the public transport movement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid progress of face recognition it has more and more applications in everyday life. Although its backbone, very deep neural networks, also show improvement both in terms of accuracy and efficiency their computational cost and memory usage is still a limiting factor for deploying these models on a hardware with limited computational and power resources, such as mobile or embedded devices. Here arises the task of learning fast and compact deep neural networks which have a comparable accuracy to the complex model as requirement of real-life applications. Another issue is that sometimes face recognition system may run models of different complexity depending of the devices used for biometric template extraction (i.e. desktop with GPU or mobile phone), so the compatibility between the face descriptors is desirable. Our paper considers both this cases: we propose a new method for learning fast and compact face recognition model which has a similar performance to a much more complex model used for transferring its knowledge and we also show that both these models can be used for verification in a single face recognition system. To the best of our knowledge such evaluation of a compatibility between 2 different models for face recognition was never done before our work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Dynamic facial expression recognition has many useful applications in social networks, multimedia content analysis, security systems and others. This challenging process must be done under recurrent problems of image illumination and low resolution which changes at partial occlusions. This paper aims to produce a new facial expression recognition method based on the changes in the facial muscles. The geometric features are used to specify the facial regions i.e., mouth, eyes, and nose. The generic Fourier shape descriptor in conjunction with elliptic Fourier shape descriptor is used as an attribute to represent different emotions under frequency spectrum features. Afterwards a multi-class support vector machine is applied for classification of seven human expression. The statistical analysis showed our approach obtained overall competent recognition using 5-fold cross validation with high accuracy on well-known facial expression dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a novel approach for human activities recognition (HAR) based on body articulations (joints) that represent the connection between bones in the human body which join the skeletal system such as the knee, shoulder and hand, and which are made to allow different degrees and types of movement. To implement our system, we used PoseNet to extract articulation points, which will be classified employing transfer learning approach to recognize the activity. The created system will be named in the rest of the paper (PTLHAR). The experimental results show that the proposed approach provides a significant improvement over state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Facial expressions play a key role in identifying the internal emotion state of human beings. Human beings have the tendency to recognize human emotions without any delay. But, a fully automated expression recognition by a computer is a problem that still persists. Towards solving this problem, a Local Optimal Oriented Pattern (LOOP) has been proposed in this paper. This descriptor is proposed to overcome some of the drawbacks in existing feature descriptors, Local Binary Pattern (LBP) and Local Directional Pattern (LDP) by combining the strengths of each of these two descriptors. The LOOP descriptor has been applied on JAFFE, MUG, WSEFEP and ADFES databases in person independent setup. The experiments are conducted for six, seven expressions in all the four databases. The experimental results proved that the proposed LOOP descriptor achieved a better recognition accuracy than existing methods by taking less computation time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Availability of humungous visual data and increasing in generation of visual data in Security and Surveillance domain made a pathway to Computer Vision algorithms. The existing algorithms are not precise enough for predictive analytics. Sensitive use cases such as action recognition and identifying missing people in huge crowds has thrown a challenging research of drawing accurate and precise results. The existing 2-D plots for action recognition have failed due to unstructured visual data available where the accuracy is around <50%. Due to unstructured visual data, the existing 3-D plots often get overlapped with each other. Although the accuracy is noted >90% which maps it to False Positives. The existing solutions deals with object detection through Boolean logic then Pose Plots are mapped. Our research focus in on reverse engineer the existing solutions by applying smart segmentation to isolate background and then map the pose formula to detect the action. Our proposed solution obliterates the over-lap complications and unravels the False Positives. Our proposed solution achieved accuracy and precision of mAP>0.8 for both images and video feeds.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human action recognition in videos is a challenging task in the field of computer vision. Based on the idea of integrating temporal and spatial feature, many works have proposed a variety of methods for extracting spatiotemporal features, such as two-stream network and 3D convolution neural network (3D-CNN). However, due to the huge computational cost of optical flow for two-stream network and the huge number of parameters of 3D-CNN, the computational time required for action recognition is very long, therefore it is difficult to meet the requirements of real-time recognition. This paper aims to explore an efficient architecture of 3D-CNN for action recognition. On the premise of guaranteeing the recognition accuracy, we aim to greatly reduce the computational cost. In order to ensure good performance while reducing the amount of input data, we present Global Evaluate-and-Rescale (GER) Network, which is able to automatically extract the key frames of input data. We have evaluated the performance of our proposed model on two challenging human action recognition datasets UCF101 and HMDB51. The experimental results show that GER Network can reduce up to 50% of the computation time for recognition while achieving approximate accuracy with state-of-the-art 3D-CNN models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we investigate temporal features that are extracted by a multi-channel convolutional neural network in depth map-based human action recognition. At the beginning, for the non-zero pixels representing the person shape in each depth map we calculate handcrafted features. On multivariate time-series of such handcrafted features we train a multi-class, multi-channel CNN to model temporal features as well as we extract statistical features of time-series. The concatenated features are stored in a common feature vector. Afterwards, for each class we train a separate one-against-all convolutional neural network to extract class-specific features of depth maps. For each class-specific, multivariate time-series we calculate statistical features of time-series. Finally, each class-specific feature vector is concatenated with the common feature vector resulting in an action feature vector. For each action represented by action feature vectors we train a multi-class classifier with one-hot encoding of output labels. The recognition of the action is done by a voting-based ensemble operating on such one-hot encodings. We demonstrate experimentally that on UTD-MHAD dataset the proposed algorithm outperforms state-of-the-art depth-based algorithms and attains promising results on MSR-Action3D dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Person re-identification (re-ID) is a valuable tool for multi-camera tracking of persons. Up till now, research on person re-ID has mainly focused on the closed-set case, where a given query is assumed to always have a correct match in the gallery set, which does not hold for practical scenarios. In this study, we explore the open-set person re-ID problem with queries not always included in the gallery set. First, we convert the popular closed-set person re-ID datasets into the open-set scenario. Second, we compare the performances of six state-of-the-art closed-set person re-ID methods under open-set conditions. Third, we investigate the impact of a simple and fast statistics-driven gallery refinement approach on the open-set person re-ID performance. Extensive experimental evaluations show that, gallery refinement increases the performance of existing methods in the low false-accept rate (FAR) region, while simultaneously reducing the computational demands of retrieval. Results show an average detection and identification rate (DIR) increase of 7.91% and 3.31% on the DukeMTMC-reID and Market1501 datasets, respectively, for an FAR of 1%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper proposes an approach to training a convolutional neural network using information on the level of distortion of input data. The learning process is modified with an additional layer, which is subsequently deleted, so the architecture of the original network does not change. As an example, the LeNet5 architecture network with training data based on the MNIST symbols and a distortion model as Gaussian blur with a variable level of distortion is considered. This approach does not have quality loss of the network and has a significant error-free zone in responses on the test data which is absent in the traditional approach to training. The responses are statistically dependent on the level of input image’s distortions and there is a presence of a strong relationship between them.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer vision systems based on convolutional neural networks are being rapidly introduced in the field of precision agriculture to solve the problem of scene recognition. Convolutional networks allow performing high-precision recognition, but a significant problem is the expensive process of adapting the network to new conditions. This article proposes a method of fast adaptation of the trained network to minor changes in the source domain without annotating new data. This method is known as Adversarial Domain Adaptation, in the current paper it is applied to the problem of agricultural scene recognition in automated harvesting. The initial training procedure is modified for parallel training of an additional subnet on unannotated data, which makes it possible to compensate the domain shift due to adversarial training. This approach allows us to monotonically increase the quality of all recognized classes of objects and to enhance the stability of CNN model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we apply commonly known methods of non-adaptive interpolation (nearest pixel, bilinear, B-spline, bicubic, Hermite spline) and sampling (point sampling, supersampling, mip-map pre-filtering, rip-map pre-filtering and FAST) to the problem of projective image transformation. We compare their computational complexity, describe their artifacts and than experimentally measure their quality and working time on mobile processor with ARM architecture. Those methods were widely developed in the 90s and early 2000s, but were not in an area of active research in resent years due to a lower need in computationally efficient algorithms. However, real-time mobile recognition systems, which collect more and more attention, do not only require fast projective transform methods, but also demand high quality images without artifacts. As a result, in this work we choose methods appropriate for those systems, which allow to avoid artifacts, while preserving low computational complexity. Based on the experimental results for our setting they are bilinear interpolation combined with either mip-map pre-filtering or FAST sampling, but could be modified for specific use cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the present work, we introduce a data processing and analysis pipeline, which ensures the reproducibility of machine learning models chosen for MR image recognition. The proposed pipeline is applied to solve the binary classification problems: epilepsy and depression diagnostics based on vectorized features from MR images. This model is then assessed in terms of classification performance, robustness and reliability of the results, including predictive accuracy on unseen data. The classification performance achieved with our approach compares favorably to ones reported in the literature, where usually no thorough model evaluation is performed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In our paper we combine neural networks with Hidden Markov Models for multiview object recognition. While convolutional neural networks are very efficient in object recognition there is still need for improvements in many practical cases. For example if the training is not satisfactory or the object localization is not solved with the neural network then information fusion from several images and from inertial sensors can still help a lot to improve recognition rate. In our use case we are to recognize objects from several directions with the VGG16 network. We assume that no localization of objects is possible on the images due to the lack of bounding box annotations, we have to recognize the objects even if they occupy only about 25% of the field of view. To overcome this problem we propose to use a Hidden Markov Model approach where the consecutive queries, shots taken from different viewing directions, are first evaluated with VGG16 inference and then with the Viterbi algorithm. The role of the later is to estimate the most probable sequence of poses of candidates (from the predefined 8 horizontal views in our experiments), thus we can select the most probable object. The approach, as evaluated with different number of queries over a set of 40 objects from the COIL-100 dataset, can result in significant increase of hit rate compared to one shot recognition or to combining individual shots without the HMM model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Despite the significant success in the field of text recognition, complex and unsolved problems still exist in this field. In recent years, the recognition accuracy of the English language has greatly increased, while the problem of recognition of hieroglyphs has received much less attention. Hieroglyph recognition or image recognition with Korean, Japanese or Chinese characters have differences from the traditional text recognition task. This article discusses the main differences between hieroglyph languages and the Latin alphabet in the context of image recognition. A light-weight method for recognizing images of the hieroglyphs is proposed and tested on a public dataset of Korean hieroglyph images. Despite the existing solutions, the proposed method is suitable for mobile devices. Its recognition accuracy is better than the accuracy of the open-source OCR framework. The presented method of training embedded net bases on the similarities in the recognition data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep neural networks help solving different images related tasks very efficiently, though their cost is high whereas a lot of data are required for training. While there is a great demand to build neural network models for optical character detection and recognition for different languages, such as, for mobile real-time applications, datasets collecting and labeling are quite expensive. In this paper, we propose the fully automated approach for synthetic images with text generation based on deep learning and projective geometry methods. For evaluation, we trained two neural networks on the dataset generated by our algorithm. Our approach enables to decrease the false negative rate on real images from SVT and SVT-50 datasets in comparison with training on SynthText dataset, giving ~1% of F1-measure increasing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for identifying 34 animal classes corresponding to the most conventional animals found in the domestic areas of Europe by using four types of Convolutional Neural Networks (CNNs), namely VGG-19, InceptionV3, ResNet-50, and MobileNetV2. We also built a system capable of classifying all these 34 animal classes from images as well as in real-time from videos or a webcam. Additionally, our system is capable to automatically generate two new datasets, one dataset containing textual information (i.e. animal class name, date and time interval when the animal was present in the frame) and one dataset containing images of the animal classes present and identified in videos or in front of a webcam. Our experimental results show a high overall test accuracy for all 4 proposed architectures (90.56% for VGG-19 model, 93.41% for InceptionV3 model, 93.49 for ResNet-50 model and 94.54% for MobileNetV2 model), proving that such systems enable an unobtrusive method for gathering a rich collection of information about the vast numbers of animal classes that are being identified such as providing insights about what animal classes are present at a given date and time in a certain area and how they look, resulting in valuable datasets especially for researchers in the area of ecology
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work we discuss the task of search, localization and recognition of price zone within a photograph of the price tag. The task is being addressed for the case when image is acquired by small-scale digital camera and calculation device has significant resource constraints. The proposed approach is based on Niblack binarization algorithm, analysis and clasterization of connected components in conditions of known price tag geometrical model. The algorithm was tested on a private dataset and has shown high quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automated text recognition is used in autonomous driving systems, search engines, document analysis, and many other applications. There are many techniques to extract text information from scanned documents, but text recognition from arbitrary images is a much harder task. Recently suggested deep learning approaches have demonstrated highquality results, but they require a huge amount of data to achieve them. The process of collecting and labelling training data to train a deep learning network is costly. In this paper, we suggest an approach for automatic dataset generation for text recognition for arbitrary languages. We use a generative adversarial network structure, which is adapted to generate readable and clear text looking naturally on the image background. We evaluate our approach using SegLink and Textboxes++ text localization models, which were trained on examples generated by SynthText and by variations of our method. The comparison showed the superiority of our method on a subset of the ICDAR 2017 dataset for English and Arabic languages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the major challenges in mobile networks and digital technologies is maintaining the security of real time data. In this regard, the research community developed a lot of works to fulfill this goal by proposing secure image encryption algorithms. However, some of these encryption schemes are not secure enough and lack robustness and security. In this paper, we succeed to reveal the weaknesses of a recently published encryption algorithm that is supposed to be secure and robust. We found that although the proposed network is unable to decrypt the ciphered image, it is able to perform classification on this image. We succeeded to build a deep neural network that can recognize encrypted images with an accuracy of 95.8%. Results demonstrate that our proposed approach is efficient for classifying ciphered images. These results could be valuable for further works into the topic of cryptanalysis using deep learning.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a new method to detect monospaced font in text line images. Although many authors address more complex problems of text recognition or font recognition, this problem is still challenging when dealing with camera-captured images of identity documents. However, such a font characteristic can be useful in document authentication. These images usually contain complex backgrounds and various distortions. Our approach is based on a segmentation neural network and Fourier Transform for detecting “strong” periodic components in the segmentor output. The experimental results show that the combination of neural network and Fourier Transform deals with the task of monospaced font detection more effectively than the same Fourier analysis applied to the results of an image processing method for segmentation. The main advantage of the neural network is that its output does not depend on background, font and characters characteristics directly.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we deal with a brain tumor segmentation problem from magnetic resonance imaging (MRI), considered financially and time demanding when carrying out manually. To tackle this specific and complex domain problem, convolutional networks have proved competent due to significantly better performance than standard segmentation approaches. Therefore, within our research, we propose an approach which is dealing with tumor segmentation. During the elaboration, we propose multiple architectures, training phases and evaluation metrics in order to facilitate reliable and automatic delineation of tumorous tissues. For this purpose, we proposed a novel adaptation of the Tversky index loss formula to avoid label imbalance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyper-spectral satellite imagery, consisting of multiple visible or infrared bands, is extremely dense and weighty for deep operations. Regarding problems related to vegetation as, more specifically, tree segmentation, it is difficult to train deep architectures due to lack of large-scale satellite imagery. In this paper, we compare the success of different single channel indices, which are constructed from multiple bands, for the purpose of tree segmentation in a deep convolutional neural network (CNN) architecture. The utilized indices are either hand-crafted such as excess green index (ExG) and normalized difference vegetation index (NDVI) or reconstructed from the visible bands using feature space transformation methods such as principle component analysis (PCA). For comparison, these features are fed to an identical CNN architecture, which is a standard U-Net-based symmetric encoder-decoder design with hierarchical skip connections and the segmentation success for each single index is recorded. Experimental results show that single bands, which are constructed from the vegetation indices and space transformations, can achieve similar segmentation performances as compared to that of the original multi-channel case.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Quantitative imaging of retinal arteries and veins offers unique insights into cardiovascular and microvascular diseases but is laborious. We developed and tested a method to automatically identify arterial/venular (A/V) vessels in digital retinal images in conjunction with a semi-automatic segmentation technique. Methods of segmentation of blood vessels and the optic disc (OD) was performed as previously described, using a dataset of 10 colour fundus images. Using the OD as a reference a graph representation was constructed using the vessel skeletons. Vessel bifurcations and crossings were identified based on direction and local geometry, and A/V classification was carried out by fuzzy logic classification using colour information. Results were compared with expert classification. Preliminary results showed an average true positive rate for arteries of TPRA=0.83 and TPRV=0.74 for veins. With an overall average of TPRall=0.79 for both vessel type jointly. Computer-based systems can assess local and global aspects of the retinal microvascular architecture, geometry and topology. Automated A/V classification will facilitate efficient cost-effective assessment of clinical images at scale.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Interactive segmentation that extracts a specific foreground selected by the user input is widely employed in many user-interactive applications such as image editing and ground-truth labeling. In general, most interactive segmentation methods iteratively refine the previously obtained result using additional user interactions because they often produce unsatisfactory results with a single user input. A recently developed convolutional neural network (CNN)-based interactive segmentation method called deep interactive object selection has achieved high segmentation accuracy with fewer user interactions than earlier non-CNN-based approaches. However, the computational efficiency of deep interactive object selection deteriorates due to the repetitive feature extraction stage for each user interaction. Furthermore, the deep interactive object selection requires graph cut as a post-processing step to refine the boundary segments. To solve this problem, this paper presents a deep CNN-based interactive segmentation method employing an effective and simple user interaction-based attention module that does not require the repetitive feature extraction. In addition, we adopt Cartesian to polar coordinate transformation to further improve the segmentation performance. Experimental results demonstrate that the proposed interactive segmentation method is superior to the conventional ones in terms of segmentation accuracy and computational efficiency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper focuses on the semantic segmentation networks of 3D point clouds for indoor scenes. We first reduce the PointNet structure to get a reduced point network (RPN) that achieves the same performance but has less training and evaluation time comparing with PointNet. Secondly, we propose two solutions to get scale invariance and robust test performance: one is modifying RPN to get the robust performance and adding stable multi-scaling layers (MPN); another is introducing a novel point-based network based on Angular coordinates instead of Euclidean coordinates for point representation (APN). The ablation study of our networks (RPN, MPN, APN) is done. Compared to state-of-the-art semantic segmentation networks based on 3D point clouds, the experimental results show that our MPN and APN networks both achieve higher training and evaluation accuracy, as well as mean intersection over union (IoU) and overall accuracy on two benchmarks. We also have better qualitative segmentation results when directly test on another benchmark indoor scenes as well as real corridor scenes from our robots RGB-D mapping.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we consider the problem of segmentation of three-dimensional fMRI images within the Bayesian framework with Markov Random Field (MRF) as the prior distribution and von Mises-Fisher distribution as the likelihood. Usually, the learning of such models is a complicated task and the exact inference is impossible in practice. To fit the proposed model, we apply the mean field approximation on the inference step in the EM algorithm. Some numerical examples are presented to illustrate the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Porous materials are widely used in different applications, in particular they are used to create various filters. Their quality depends on parameters that characterize the internal structure such as porosity, permeability and so on. Сomputed tomography (CT) allows one to see the internal structure of a porous object without destroying it. The result of tomography is a gray image. To evaluate the desired parameters, the image should be segmented. Traditional intensity threshold approaches did not reliably produce correct results due to limitations with CT images quality. Errors in the evaluation of characteristics of porous materials based on segmented images can lead to the incorrect estimation of their quality and consequently to the impossibility of exploitation, financial losses and even to accidents. It is difficult to perform correctly segmentation due to the strong difference in voxel intensities of the reconstructed object and the presence of noise. Image filtering as a preprocessing procedure is used to improve the quality of segmentation. Nevertheless, there is a problem of choosing an optimal filter. In this work, a method for selecting an optimal filter based on attributive indicator of porous objects (should be free from "levitating stones" inside of pores) is proposed. In this paper, we use real data where beam hardening artifacts are removed, which allows us to focus on the noise reduction process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
6D pose estimation for robotic gripping is greatly affected by cluttering, rendering and occlusion. Unlike the mainstream method with RGB images which is troubled by rendering, our approach for 3D orientation estimation is based on a Denoising Point Cloud Auto-encoder (DPCAE) which can avoid the rendering problem and eliminate cluttering and occlusion. Independent of the real pose-annotated training data, the Auto-encoder uses the point cloud data generated by the random object coverage of each object surface in the simulated environment, with the ability to obtain an implicit representation of object orientation and remove outliers to restore the surface of the objects. Experiments on the LineMod dataset show that our proposed approach is superior to those that require similar model-based approaches and competes with state-of-the-art approaches with real pose-annotated images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a memorability based image-to-image translation technique to make an image more memorable while retaining its high-level contents. Conventionally, the image-to-image translation task aims to learn the mapping between images of two different domains using a set of aligned image pairs. However, dataset having such one-to-one mapping is not available for memorability based image-to-image translation. Therefore, the aim of the proposed task is defined to learn the mapping F: I → I' between two image domains I and I'. Here, I corresponds to input image domain and I' is the unknown image domain containing the modified version of the input images. Also, every image in I' is more memorable than its corresponding image in I. Therefore, the proposed task is achieved by developing a deep learning based method to learn the mapping F: I→ I' using mean-squared error and memorability loss between I and F(I). The experimental results showed that the proposed approach increases the memorability of the given image better than the state-of-the-art image-to-image translation techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Classical signal representation techniques generally use a description of the components on a basis on which the representation of the signal is unique such as wavelets network. Conversely, sparse representations consist in the decomposition of the signal on a dictionary comprising a number of elements much larger than the dimension of the signal. This technique can be widely used for representation, compression, denoising and separation of all types of signals. Consequently, some researches have confirmed that the use of a predefined dictionary is less efficient than a dictionary from training data. So, the idea of this paper is to propose a new technique for the creation of a dictionary using the wavelet decomposition to enhance the sparse representation of images. This technique is based on the combination of sparse coding and the fast wavelet transform algorithms for image representation. Our results obtained using different universal image databases showed greater performances in the representation of images when compared to some methods from the state of the art.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we evaluate and compare the performance of three machine learning classifiers: Support Vector Machines (SVM), Decision Trees (DT) and K-Nearest Neighbor (K-NN) for high resolution satellite image scene classification.This study aims at providing insights into the selection of the appropriate classifier and highlighting the importance of the appropriate setting of the classifier parameters. We illustrate these issues through applying scene classification to UC-Merced high resolution satellite image dataset. Image features are obtained through the SURF descriptor and BOVW model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many remote sensing applications require high spatial resolution images, but the elevated cost of these images makes some studies unfeasible. Single-image super-resolution algorithms can improve the spatial resolution of a lowresolution image by recovering feature details learned from pairs of low-high resolution images. In this work, several configurations of ESRGAN, a state-of-the-art algorithm for image super-resolution, are tested. We make a comparison between several scenarios, with different modes of upsampling and channels involved. The best results are obtained training a model with RGB-IR channels and using progressive upsampling.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a wavelet domain based algorithm is proposed for the image detail enhancement. The proposed algorithm works on the principle of residual proximity measure between the image and it's residual. The detail features of an image are enhanced by using a pixel and patch-based fast search mechanism. The unique properties of the stationary wavelet transform such as redundancy and approximate scale invariance are utilized along with the fast patchbased search mechanism. The proposed model works well for natural images. By incorporating the wavelet analysis along with the residual structure similarity, we are able to mine the fine detail features required for the image enhancement. The proposed algorithm is compared with benchmark algorithms. The experimental analysis is carried out to show the efficacy of the proposed algorithm. The visual, as well as quantitative results show that proposed algorithm is on par with the other detail enhancement algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose a dimensionality reduction technique, which is based on the principal component analysis of homogenous spatial regions of hyperspectral images. In the proposed technique, we rely on the linear mixture model and use a dimensionality estimation procedure to split an image into homogenous regions. The experiments carried out using well-known hyperspectral image scenes show that the proposed technique allows obtaining compact representations of image regions in reduced spectral subspaces and can be considered as a segmentation technique.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image registration is a problem of aligning two or more images of the same scene or object. The case when images are taken using different sensors - multimodal image registration - has applications in medical imaging and remote sensing. Unfortunately, many of the existing image registration methods operate under crude assumptions (i.e., the intensities of images are linearly correlated), which makes them inapplicable for the accurate multimodal registration. One approach to this task is to use deep learning to capture the complex intensity dependencies between images of different modalities. However, while deep learning methods produce good results, most of them are trained end-to-end and do not utilize the accumulated body of knowledge about image registration using “classic” information-theoretic and statistical methods. In this paper we consider the specific case of multimodal image registration - of optical and synthetic aperture radar (SAR) images. We use classic feature-based registration pipeline (first, corresponding feature points are found, then RANSAC is used as the transform estimator). Within this method we compare the effectiveness of various feature point detection and correspondence methods - both neural network-based and traditional. We find that Siamese network outperforms (but only slightly) the classic cross-entropy-based method for finding correspondences. Finally, we propose a hybrid method and show that it outperforms both “classic” method and an end-to-end network by a significant margin.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a method for metric rectification of planar objects that preserves angles and length ratios. An inner structure of an object is assumed to follow the laws of Manhattan World i.e. the majority of line segments are aligned with two orthogonal directions of the object. For that purpose we introduce the method that estimates the position of two vanishing points corresponding to the main object directions. It is based on an original optimization function of segments that estimates a vanishing point position. For calculation of the rectification homography with two vanishing points we propose a new method based on estimation of the camera rotation so that the camera axis is perpendicular to the object plane. The proposed method can be applied for rectification of various objects such as documents or building facades. Also since the camera rotation is estimated the method can be employed for estimation of object orientation (for example, during a surgery with radiograph of osteosynthesis implants). The method was evaluated on the MIDV-500 dataset containing projectively distorted images of documents with complex background. According to the experimental results an accuracy of the proposed method is better or equal to the-state-of-the-art if the background occupies no more than half of the image. Runtime of the method is around 3ms on core i7 3610qm CPU.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Despite the remarkable success of deep learning in pattern recognition, deep network models face the problem of training a large number of parameters. In this paper, we propose and evaluate a novel multi-path wavelet neural network architecture for image classification with far less number of trainable parameters. The model architecture consists of a multi-path layout with several levels of wavelet decompositions performed in parallel followed by fully connected layers. These decomposition operations comprise wavelet neurons with learnable parameters, which are updated during the training phase using the back-propagation algorithm. We evaluate the performance of the introduced network using common image datasets without data augmentation except for SVHN and compare the results with influential deep learning models. Our findings support the possibility of reducing the number of parameters significantly in deep neural networks without compromising its accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image classification is an area where deep learning and especially stacked Auto-encoders have really proven their strength. The contributions of this paper lie in the creation of a new classifier to remedy some classification problems. This new method of classification presents a combination of the most used techniques in Deep Learning (DL) and Sparse Coding (SC) in the field of classification. Proposed deep neural networks consist of three stacked Auto-encoders and a Softmax used as an outer layer for classification. The first Auto-encoder is created from a sparse representation of all images of the dataset. The sparse representation of all images represents the decoder part of the first Auto-encoder. Then the transpose of the matrix is applied to get the encoder part. Experiments performed on standard datasets such as ImageNet and the Coil-100 reveal the efficacy of this approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While deep neural networks (DNNs) have been shown to outperform humans on many vision tasks, their intransparent decision making process inhibits wide-spread uptake, especially in high-risk scenarios. The BagNet architecture was designed to learn visual features that are easier to explain than the feature representation of other convolutional neural networks (CNNs). Previous experiments with BagNet were focused on natural images providing rich texture and color information. In this paper, we investigate the performance and interpretability of BagNet on a data set of human sketches, i.e., a data set with limited color and no texture information. We also introduce a heatmap interpretability score (HI score) to quantify model interpretability and present a user study to examine BagNet interpretability from user perspective. Our results show that BagNet is by far the most interpretable CNN architecture in our experiment setup based on the HI score.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a novel 3D scene reconstruction framework from a single front-mounted stereo camera on a moving vehicle. We propose image triangulations to efficiently render a 3D scene only from 2D textures, while introducing tube meshes as an effective way to render out-of-frustum points. Furthermore, we derive a 3D extended Kalman filter to fuse stereo estimates temporally between frames and showcase a render pipeline, which exploits OpenGL shaders to offload computational costs from the CPU to the GPU. Our approach is able to increase the stereo accuracy compared to competing approaches on the KITTI visual odometry dataset. We also introduce a challenging view prediction evaluation scenario on the SYNTHIA dataset, in which our approach comes out on top in terms of SSIM, 1-NCC error and completeness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The reconstruction of an image distorted by a linear transformation is a problem that is unstable with respect to the perturbation of the mathematical model of the image formation. This instability is overcome by using a priori information about the class of original images. Among the ways to use such information, there is an assumption that the original image belongs to the class of piecewise constant images. The class of piecewise constant functions can provide a good approximation for signals encountered in practice since such functions can approximate any square-integrable signal with arbitrary accuracy. On the other hand, the assumption that the brightness value of the image takes a finite set of values is plausible for some applied studies. Such a proposal, in particular, is made in the tomography, where studied samples can consist of a small number of fractions. In this paper, we propose an algorithm for reconstruction of piecewise constant signals blurred by a linear transformation and investigate the possibility of its application to the original unblurred signal estimation. For ease of implementation, the case of one-dimensional signals is considered.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Examination of peripheral blood smears by light microscopy helps to provide valuable information for disease diagnosis, but remains one of the major labor-intensive procedures in hematology laboratory. As a key part of white blood cell morphology examination, leukocyte detection at 10× magnification objective lens is considered as small object detection problem in this paper. After establishing leukocyte dataset at 10× magnification objective lens, we design a classification-based two-stage approach to this challenging problem. Stage one generates proposals by pixellevel ANN pipeline based on the color and size of leukocyte. Stage two is responsible for proposal classification task by CNN architecture. Extensive experiments are carried out to prove its effectiveness for peripheral blood samples. Experimental results demonstrate that the proposed classification-based approach, obtains a desired precision of 94.69%, recall of 95.73%, accuracy of 90.85% and Fβ of 0.96. The data is available at https://doi.org/10.6084/m9.figshare.9037370.v1.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The aim of this paper is to propose a novel method to explain, interpret and support the decision-making process of deep Convolutional Neural Network (CNN). This is achieved by analysing neuron activations of trained 3D-CNN on selected layers via Gaussian Mixture Model (GMM) and custom binary encoding of both training and test images based on their activation’s affiliation to computed GMM components. Based on the similarity of encoded image representations, the system is able to retrieve most activation-wise similar atlas (training) images for given test image and therefore support and clarify its decision. Possible uses of this method include mainly Computer-Aided Diagnosis (CAD) systems working with medical imaging data such as magnetic resonance (MRI) or computed tomography (CT) scans. Network’s decision interpretation in the form of similar domain examples (images) is natural to the work-flow of the system’s operating medical personnel.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer vision for biomedical imaging applications is fast developing and at once demanding field of computer science. In particular, computer vision technique provides excellent results for detection and segmentation problems in tomographic imaging. X-ray phase contrast Tomography (XPCT) is a noninvasive 3D imaging technique with high sensitivity for soft tissues. Despite a considerable progress in XPCT data acquisition and data processing methods, the problem in degradation of image quality due to artifacts remains a widespread and often critical issue for computer vision applications. One of the main problems originates from a sample alteration during a long tomographic scan. We proposed and tested Simultaneous Iterative Reconstruction algorithm with Total Variation regularization to reduce the number of projections in high resolution XPCT scans of ex-vivo mouse spinal cord. We have shown that the proposed algorithm allows tenfold reducing the number of projections and, therefore, the exposure time, with conservation of the important morphological information in 3D image with quality acceptable for computer graphics and computer vision applications. Our research paves a way for more effective implementation of advanced computer technologies in phase contrast tomographic research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work, we propose a method for tomography reconstruction in case of a limited field of view, when the whole image of the investigated sample does not fit on the detector. Proposed technique based on iterative procedure with corrections on each step in sinogram space and reconstruction space. On synthetic and experimental data shown, that proposed technique allows to improve tomography reconstruction quality and extends the field of view.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We address the problem of non-contact geometrical measurements of hard-to-reach objects that is an important task in various industrial and medical applications. We have developed two small-size prism-based systems for simultaneous acquisition of stereoscopic images by a single sensor. For a correct mathematical description of these systems, we use a ray tracing camera model based on a vector form of Snell’s law. We demonstrate that using the chessboard calibration target allows simultaneous geometrical calibration and image quality assessment. We show that the appearance of chromatic aberrations in RGB images caused by the prism may be significantly reduced by applying separate rectification procedure to each color channel. Experiments confirm that the developed optical systems provide high image quality and the software provides high precision of three-dimensional (3D) geometrical measurements. Described systems may become the basis of small-diameter endoscopic probes for various applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Convolutional Neural Network and Image Application
In this paper, we propose an economical system for remote video player control. Through this system, we can use several simple gestures to control the video player, and these gestures can be alternated based on the user’s requirement and habits. The datasets used to train the gesture recognition model are recorded by a simple web camera in the laboratory. We utilize the CNN (convolutional neural network) to train the datasets and the user interface is designed by PyQt5. The gesture recognition system can be applied to switching television programs, controlling video games and household appliances etc.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pneumonia is an infection of the lungs that can cause mild to severe illness and affects millions of people worldwide. Imaging studies are therefore crucial for the detection and management of patients with pneumonia, and radiography is currently the best method for diagnosis. However, clinical diagnosis of chest X-rays can be a challenging task as it requires interpretation by highly trained clinicians. This study uses deep learning to perform binary classification of frontal-view chest X-ray images to detect signs of childhood pneumonia. The effectiveness of the classifiers was validated using a dataset that was collected by [5] containing 5,856 labeled X-ray images from children. The classifiers were able to identify the presence or absence of childhood pneumonia with an accuracy between 96-97%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present a comparison of performance for different convolutional neural networks (CNN) for automatic classification of corrosion and coating damages on bridge constructions from images. Image recordings were taken during inspections. Through manual categorization and data augmentation, a total of 9300 images were collected and divided into five classes. Four different CNNs were trained using transfer learning in MATLAB. We have evaluated test performance through the metrics recall, precision, accuracy and F1 score. Test performance was also evaluated on damage detection accuracy, meaning how well the networks detect images that contain a damage. The convolutional neural network trained using VGG-16 had the overall best performance results, with average recall, precision, accuracy and F1 score being 95.45%, 95.61%, 97.74% and 95.53%, respectively. In the category of overall damage detection AlexNet performed best with 99.14% accuracy. The obtained results are promising, and make it possible to conclude that CNNs have a great potential in bridge inspections for automatic analysis of corrosion and coating damages.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Breast cancer is one of the most widespread causes of women’s death worldwide. Successful treatment can be achieved only by the early and accurate tumor diagnosis. The main method of tissue diagnosis taken by biopsy is based on the observation of its significant structures. We propose a novel approach of classifying microscopy tissue images into 4 main cancer classes (normal, benign, In Situ and invasive). Our method is based on comparing and determining the similarity of the new tissue sample with previously by specialists annotated examples that are compiled in the collection with other labeled samples. The most probable class is statistically determined by comparing a new sample with several annotated samples. The usual problem of medical datasets is the small number of training images. We have applied suitable dataset augmentation techniques, using the fact that flipping or mirroring of the sample does not change the information about the diagnosis. Our other contribution is that we show the histopathologist the reason why the algorithm has classified tissue into the particular cancer class by ordering the collection of correctly annotated samples by their similarity to the input sample. Histopathologists can focus on searching for the key structures corresponding to the predicted classes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we study the recently introduced neural network architecture HoughNet for the ability to accumulate transferable high-level features. The main idea of that neural network is to use convolutional layers separated with Fast Hough Transform layers to enable an analysis of complex non-linear statistics along different lines. We show that different convolutional blocks in this neural network have essentially different purposes. While initial features extracting is task-specific, the main part of the neural network operates with high-level features and do not require re-training in order to be applied to data from a different domain. To prove our statement, we two sets of the images with different origins and demonstrate Transfer Learning presence in the neural network except for the first layers which are highly task-specific.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Indoor positioning and navigation inside an area with no GPS-data availability is a challenging problem. There are applications such as augmented reality, autonomous driving, navigation of drones inside tunnels, in which indoor positioning gets crucial. In this paper, a tandem architecture of deep network-based systems, for the first time to our knowledge, is developed to address this problem. This structure is trained on the scene images being obtained through scanning of the desired area segments using photogrammetry. A CNN structure based on EfficientNet is trained as a classifier of the scenes, followed by a MobileNet CNN structure which is trained to perform as a regressor. The proposed system achieves amazingly fine precisions for both Cartesian position and quaternion information of the camera.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vehicle classification is an important topic which is still under research consideration because of its role in road surveillance, security system, traffic monitoring, and accident prevention. In this paper, we propose a deep learning model for vehicles classification using the Convolutional Neural Networks (CNN) integrated with a statistical moments layer. We referred to the model as ICNN. As an additional layer, the moments layer extracts statistical moments features from the feature maps obtained from convolutions layers. The moments layer is fed the fully-connected classifier of the network which is fine-tuned to get better results. Our Integrated CNN model (ICNN) achieves 97.1% accuracy compared to the most popular algorithms used in this field such as K Nearest Neighbour (KNN), and Support Vector Machine (SVM), which known as good tools for object classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer-aided diagnosis of cancer based on endoscopic image analysis is a promising area in the field of computer vision and machine learning. Convolutional neural networks are one of the most popular approaches in the endoscopic image analysis. The paper presents an endoscopic video analysis algorithm based on the use of convolutional neural network. To analyze the quality of the algorithm on the video data from the endoscope, the intersection over union (IoU) metric for object detection is used. The experimental results shows that the average value of IoU coefficient for the developed algorithm is 0.767, which corresponds to a high degree of intersection of areas identified by an expert and the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Regularization methods play an important role in artificial neural networks training, improving generalization performance and preventing them from overfitting. In this paper, we introduce a new regularization method, based on the orthogonalization of convolutional layer filters. Proposed method is easy to implement and it has plug-and-play compatibility with modern training approaches, without any changes or adaptations on their part. Experiments with MNIST and CIFAR10 datasets showed that the effectiveness of the suggested method depends on number of filters in the layer, and maximum increase in quality is achieved for architectures with small number of parameters, which is important for training fast and lightweight neural networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper considers a method for detection of road surface markings using a camera mounted on top of a vehicle. The detection is done with an orientation-aware detector based on a convolutional neural network. To successfully detect the orientation and position of road surface markings, the input frontal image is converted to a bird’s-eye view image using inverse perspective matching. Synthetic image dataset is constructed with aid of MSER (maximally stable extremal regions) algorithm to solve data imbalance problem. The detector is trained to estimate orientations of the detected objects in addition to the class labels and positions. Pretrained DenseNet based YOLOv2 model is modified to detect rotated rectangles with an additional cost function and new efficient IOU (intersection of union) measure. Instead of directly estimating the orientation angle of the road surface markings, probabilistic estimation is done with quantized angular bins. Benchmark dataset is formulated for evaluation and the experimental results showed that the considered algorithm provides promising result while running in a real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Evolution in machine vision systems brought up lately some matters of security. Adversarial computer vision is the field that deals with these matters, producing either adversarial attack proposals or defensive strategies and techniques against them. This article is a review of computer vision security threats and defensive techniques that have been proposed by researchers until now and intents to become a guide for any researcher who is interested to work in the field of adversarial computer vision. Initially, a short history of the subject and main interests of the researchers in this field are presented. After the most important proposed attacks based on adversarial examples and their integrity are analyzed and an updated taxonomy of adversarial computer vision attacks is proposed. Finally, the defensive strategies and techniques that have been proposed are also discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer vision systems are important for capturing environments, for facial recognition and as a way to scan objects for documenting and for manufacturing. One of the current challenges is to scan objects that change dynamically, whether rigid transformations or shape deformations. This paper presents a new system based on an RGB-D camera array, an array which is calibrated by means of a set of equations that relate the distance, angles and resolution of the cameras. The Iterative Closest Point algorithm is proposed for a fine alignment, as well with a process of reconstruction and elimination of noise by means of a Poisson distribution function. The system was exhaustively validated using two forms with different properties. When comparing the obtained result of the scan versus the real models by means of the distance of Hausdorff, errors of no more than 0.0045 mm were obtained. In addition, an experiment is performed by scanning the palm of the hand under deformations and movements. These results show that the system can scan static and non-static and dynamic forms, thereby demonstrating its usefulness for the reconstruction, analysis and manufacture of objects of different classes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual odometry (VO) is one of the most challenging techniques in computer vision for autonomous vehicle/vessels. In VO, the camera pose that also represents the robot pose in ego-motion is estimated analyzing the features and pixels extracted from the camera images. Different VO techniques mainly provide different trade-offs among the resources that are being considered for odometry, such as camera resolution, computation/communication capacity, power/energy consumption, and accuracy. In this paper, a hybrid technique is proposed for camera pose estimation by combining odometry based on triangulation using the long-term period of direct-based odometry and the short-term period of inverse depth mapping. Experimental results based on the EuRoC data set shows that the proposed technique significantly outperforms the traditional direct-based pose estimation method for Micro Aerial Vehicle (MAV), keeping its potential negative effect on performance negligible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Specular reflections are undesirable phenomena that can impair overall perception and subsequent image analysis. In this paper, we propose a modern solution to this problem, based on the latest achievements in this field. The proposed method includes three main steps: image enhancement, detection of specular reflections, and reconstruction of damaged areas. To enhance and equalize the brightness characteristics of the image, we use the alpha-rooting method with an adaptive choice of the optimal parameter-alpha. To detect specular reflections, we apply morphological filtering in the HSV color space. At the final stage, there is a reconstruction of damaged areas using adversarial neural networks. This combination makes it possible to quickly and effectively detect and remove specular reflections, which is confirmed by a series of experiments given by the experimental section of this work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present a collection of 24 multiple object scenes recorded under 18 multiple light source illumination scenarios each. The illuminants are varying in dominant spectral colours, intensity and distance from the scene. We mainly address the realistic scenarios for evaluation of computational colour constancy algorithms, but also have aimed to make the data as general as possible for computational colour science and computer vision. Along with the images, we provide also spectral characteristics of the camera, light sources, and the objects and include pixel-by-pixel ground truth annotation of uniformly coloured object surfaces. The dataset is freely available at https://github.com/visillect/mls-dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are numerous cues which influence human visual attention. Some of the cues cannot be explored by the conventional eye-tracking studies which makes use of a pictorial data presented to the observers on common displays. Depth perception occurs naturally in the real three-dimensional environment and, therefore, the depth cues are one of them. However, the eye-tracking studies in the real environment and their evaluation are complicated to carry out with a relevant number of participants while maintaining the laboratory conditions. We propose an experimental study methodology for exploring the depth perception tendencies during the free-viewing task on a widescreen display in a laboratory. This method is beyond the current hardware capabilities of the static eye-trackers mounted on the displays. Therefore, the eye-tracking glasses were used in the study to measure the attention data. We carried out the proposed study on a sample of 25 participants and created a novel dataset suitable for further visual attention research. The depth perception tendencies on a widescreen display were evaluated from the acquired data and the results were discussed in the context of the previous similar studies. Our results revealed some differences in the depth perception tendencies in comparison to the previous studies with the two-dimensional pictorial data and resembled some depth perception tendencies observed in the real environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual Attention Models are usually tested using collections of natural images that have intentionally salient objects and obvious context information. On the other hand, in the literature, few algorithms have considered datasets with non-context information to modeling attention. Moreover, Visual Attention Models haven’t been well-measured considering both contextless and context-awareness environments. In this paper, we compare some well-known Bottom Up visual attention models performance using contextless and context aware datasets, using the Pearson Correlation Coefficient as a method to assess the efficiency of each Visual Attention Model in terms of accuracy and eye fixations predictions. The best algorithm outperforms the others by reaching 59,1% and 43,8% of correlation with ground truth information in the contextless and context awareness datasets respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper deals with design solutions for a modular mechatronic demonstrator with video feedback used in research and education. The Demonstrator allows the testing of control algorithms on various mechanical structures with multiple degrees of freedom (DOF). Although simple in mechanical design, our setup offers a large number of possibilities both in research and education. It is our aim, with this demonstrator to test model based and also model free control strategies comparable to human behavior. For this paper we will construct a 2 DOF arm as the mechanical structure to be controlled. Our hypothesis is that a human does not require knowledge on its arm kinematics in order to perform tasks, thus we present in this paper an approach on determining the arm configuration to reach a target point without knowing the arm’s elements lengths.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Monocular Simultaneous Localization and Mapping (SLAM) is a crucial problem for the computer vision community. This paper deals with the solution of the SLAM problem using mobile devices that have both monocular camera and sensors: accelerometer, gyroscope and digital compass. The aim of the research is to assess the potential suitability and efficiency of using extra information from inertial sensors and compass to improve the solution quality and to reduce the time to obtain the solution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper will provide examples of computer vision tasks in which topological data analysis gave new effective solutions. Ideas underlying topological data analysis and its basic methods will be briefly described and illustrated with examples of computer vision problems. No prior knowledge in topological data analysis and computational geometry is assumed, a brief introduction to subject is given throughout the text.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the available capacities of satellite application is applying in Earth Observation (EO) video capture satellites. On the one hand, the governing equations of satellite motion in orbit indicate the fact that a satellite moves in a very predictable manner and stays on schedule. On the other hand, the superiority of Intra and Inter codings, drive video compressors to reduce the spatial redundancy and data rate. This advantage stems from the flexible coding structure and high density of angular prediction modes, in all video compression versions. How to combine these facts to achieve better performance in industrial satellite applications is our aim in this article. In this study, a novel architectural approach for EO video capture satellites that have taken the new demands of the next commercial market has been proposed. Here after looking at video coding parameters, technical EO satellite essentialities, and proving the proposed hypothesis by simulation and evidence, a preliminary configuration has been described. The Attitude and orbit control systems (AOCS) are responsible for keeping the satellite position in the required condition contributing to Onboard Data Handling (OBDH). The exact parameters of direction are key factors that can be used besides video prediction vector data to reduce calculation load in video compression algorithms that particularly explained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the present article, we present an algorithm of content based video retrieval using Frame fusion and Histogram of Oriented Gradients (HOG). Representative frames of database videos are pre-processed using frame fusion to get a high resolution representative frames and HOG descriptor of this high resolution representative frames represents corresponding database video. On other side, query frame also undergo frame fusion and the HOG descriptor of high resolution query frame is used to represent query frame. To retrieve videos similar to query frame, matching is done using Euclidean distance between HOG features of query frame and database representative frames. The proposed method is tested on news category videos. The proposed method randomly picks frames from database videos, instead of selecting keyframes as query frames. Performance is assessed with the parameters precision, recall, accuracy and Jaccard index. The experimental results have shown that the performance of the proposed method is performing better than other state-of-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper an algorithm is presented to extract the valid depth data and correct the values of flying pixels by using depth information and confidence image. An adaptive segmentation for the measured depth image is executed based on kernel density estimation and one-pass connected component labeling. Then a modified structure tensor is used to detect the invalid pixels and the flying pixels contained in the depth image. Finally these pixels are corrected with the bi-cubic interpolation method or selectively removed by voting operation. And also, the erroneous pixels are excluded with augmented confidence. Experimental results have demonstrated the effectiveness of our algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to a noticeable expansion of document recognition applicability, there is a high demand for recognition on mobile devices. A mobile camera, unlike a scanner, cannot always ensure the absence of various image distortions, therefore the task of improving the recognition precision is relevant. The advantage of mobile devices over scanners is the ability to use video stream input, which allows to get multiple images of a recognized document. Despite this, not enough attention is currently paid to the issue of combining recognition results obtained from different frames when using video stream input. In this paper we propose a weighted text string recognition results combination method and weighting criteria, and provide experimental data for verifying their validity and effectiveness. Based on the obtained results, it is concluded that the use of such weighted combination is appropriate for improving the quality of the video stream recognition result.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the field of document analysis and recognition using mobile devices for capturing, and the field of object recognition in a video stream, an important problem is determining the time when the capturing process should be stopped. Efficient stopping influences not only the total time spent for performing recognition and data entry, but the expected accuracy of the result as well. This paper is directed on extending the stopping method based on next integrated recognition result modelling, in order for it to be used within a string result recognition model with per-character alternatives. The stopping method and notes on its extension are described, and experimental evaluation is performed on an open dataset MIDV-500. The method was compares with previously published methods based on input observations clustering. The obtained results indicate that the stopping method based on the next integrated result modelling allows to achieve higher accuracy, even when compared with the best achievable configuration of the competing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recognition of identity documents using mobile devices has become a topic of a wide range of computer vision research. The portfolio of methods and algorithms for solving such tasks as face detection, document detection and rectification, text field recognition, and other, is growing, and the scarcity of datasets has become an important issue. One of the openly accessible datasets for evaluating such methods is MIDV-500, containing video clips of 50 identity document types in various conditions. However, the variability of capturing conditions in MIDV-500 did not address some of the key issues, mainly significant projective distortions and different lighting conditions. In this paper we present a MIDV-2019 dataset, containing video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions. The description of the added data is presented, and experimental baselines for text field recognition in different conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fake digital information is distributed heavily nowadays using social networks, new and other information sources. Digital forgeries use may lead to an unexpected result and it is quite difficult to detect tampering with just an expert view. A lot of algorithms for digital image forgery detection exist, but video forgery detection is on its early development stage. We propose a new approach for digital video forgery detection, which is based on statistical features calculation on difference shift frames. We selected three types of features for research: CC-PEV, SPAM and MP-486. We also estimated the quality of several classification techniques to detect altered frames: RBF-based SVM, linear ensemble classifier and decision tree. The experimental results showed the best combination of feature and classification algorithms for the video forgery detection problem solution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the paper we described a method for video sequence processing, which is resistant to shifts of an optical source under noisy conditions of video sequence individual frames. The noisiness of frames means blurring of individual frames of a video sequence due to sharp shifts of an optical source, data transfer artifacts or zoom operation during autofocus and consequently - as a result some defocused images series can be obtained. The novelty of the method lies in the combination of the approach of analyzing descriptors of local features of the image of graph algorithms and extrapolating the values of the camera offset by the least squares method at the moments of noise in the individual frames of the video sequence. We have depicted several experimental research results of the proposed method and a numerical comparison of qualitative characteristics with the existing ones.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The fast progress of mobile shooting technologies stimulates development of the methods for reliable assessment of image quality. In each case, successful comparison or evaluation requires a proper choice of method. The paper contains a brief tentative reference source for such investigations. We consider the most commonly used subjective methods for assessing and comparing static and video images: ACR – absolute category rating, ACR-HR – absolute category rating with hidden reference, SSCQE – single stimulus continuous quality estimation, DCR – degradation category rating, DSCQR – double stimulus continuous quality rating, PC – pair comparison, PSJ – pairwise similarity judgment, and SDSCE – simultaneous double stimulus for continuous evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automatic image colorization has increasingly become a heavily researched topic over the last decade. It has become of interest for many application areas including colorization of black and white movies, historical images, surveillance feeds and generally old image restoration. Recent state-of-the-art methods utilize Deep Convolutional Generative Adversarial Networks (DCGAN) for the colorization process. However, with the introduction of capsule networks, many flaws of the convolutional neural networks began to surface. In this paper, the convolutional network layers inside the discriminator of a DCGAN will be replaced with capsule network layers. Further studies are employed to show how capsule networks as a discriminator in a DCGAN perform in the image colorization task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In a robotised warehouse, as in any place where robots move autonomously, a major issue is the localization or detection of human operators during their intervention in the work area of the robots. This paper introduces a wearable human localization system for large warehouses, which utilize preinstalled infrastructure used for localization of automated guided vehicles (AGVs). A monocular down-looking camera is detecting ground nodes, identifying them and computing the absolute position of the human to allow safe cooperation and coexistence of humans and AGVs in the same workspace. A virtual safety area around the human operator is set up and any AGV in this area is immediately stopped. In order to avoid triggering an emergency stop because of the short distance between robots and human operators, the trajectories of the robots have to be modified so that they do not interfere with the human. The purpose of this paper is to demonstrate an absolute visual localization method working in the challenging environment of an automated warehouse with low intensity of light, massively changing environment and using solely monocular camera placed on the human body.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes a toolkit implementation for developing a disparity map and point cloud of images taken with a stereo camera. Though there are ready-made solutions for this task, the point cloud they provide shows issues with accuracy and has other weaknesses that make it problematic to use such solutions in real-life environment for a robot to navigate inside a building. The outcome of this work is a ROS package allowing to create a disparity map and point cloud that show considerably higher quality and are applicable for mapping in real life environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
While deep neural networks excel at a variety of visual tasks, obtaining large quantities of labeled data remains exorbitantly expensive or time-consuming, especially when it comes to pairs of photos and their artistic representations. To overcome the burden of annotation, various solutions exploiting the unlabeled data have been proposed recently. In this paper, we present a novel approach to the unsupervised domain adaptation problem, allowing us to successfully generate avatars from photos. Assembling a system of several neural networks, including a Generative Adversarial Network (GAN), fully trained on unlabeled data, we researched the influence of various factors on the GAN training process and eventually built a system superior to current analogues. In contrast to the existing unsupervised domain adaption approaches, the proposed solution is highly flexible, which allows tuning of individual elements of the system to achieve different visual results. During the user study, we evaluated the performance of the proposed method and found results close to the human level quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The development in the field of autonomous driving goes hand in hand with ever new developments in the field of image processing and machine learning methods. In order to fully exploit the advantages of deep learning, it is necessary to have sufficient labeled training data available. This is especially not the case for omnidirectional fisheye cameras. As a solution, we propose in this paper to use synthetic training data based on Unity3D. A five-pass algorithm is used to create a virtual fisheye camera. This synthetic training data is evaluated for the application of free space detection for different deep learning network architectures. The results indicate that synthetic fisheye images can be used in deep learning context.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a calibration method for automotive augmented reality head-up displays (AR-HUD) using a chessboard pattern and warping maps. The HUD is modeled as a pinhole camera whose intrinsic parameters are determined by employing a stereo method. We select several viewpoints within the driver’s eye box and place a smartphone at each of them in sequence, whose position is sensed by a head tracker. By automatically shifting 2D points on the HUD virtual image to 3D chessboard corners within the view of the smartphone camera, we obtain a group of 2D–3D correspondences and then compute view-dependent extrinsic parameters. Using these parameters, we reproject the chessboard corners back to the virtual image. Comparing the results with measured virtual points, we acquire 2D distributions of biases, from which we reconstruct a series of warping maps as a tool for compensating optical distortions. For any other uninvolved viewpoint in the eye box, we obtain its corresponding extrinsic parameters and warping maps through interpolation. Our method outperforms the existing ones in terms of modeling complexity as well as experimental workload. The reprojection errors at 7.5 m distance fall within a few millimeters, which indicates a high augmentation accuracy. Besides, we calibrate the head tracker by utilizing the acquired extrinsic parameters and viewpoint tracking results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, we developed a two-stage technology for improving the sharpness of images. In the first stage, the correction was performed using a linear square exponential (SE) filter with a centrally symmetric frequency response in the form of quadratic and exponential functions. This stage included setting the parameters of the SE filter and the actual processing. In the second stage, non-linear correction was carried out. The idea of the filter was to increase the impact of the central value, if it was at the edge of different intensity levels. We assumed that an increase in the absolute value of the weighted average of the differences in the point neighbourhood could be an indicator of such edges. The central point of the reference area belonged to the edge if its value was considerably greater or lesser than the significant number of values in this area. The first experiment confirmed the possibility for the improvement of the quantitative criteria of image restoration by non-linear correction. The second experiment illustrated the increase in the image sharpness obtained using a diffraction Fresnel lens. The proposed technology has opened up prospects for the use of cameras based on diffraction optic elements in mobile devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nowadays, microtomography experiments require a lot of time to collect data and process it. In order to observe realtime processes (e.g. fluid flow through porous media), measurements and calculations should be carried out fast enough, therefore optimization task should be solved. Two approaches were developed to solve it. The first one is associated with the search of optimal experimental parameters: number of projection and the quality of the detector. The second one is involved with representative elementary volume determination. Moreover, this determination technique is described in general terms and can be applied not only for porous media studies. Both algorithms are based on comparison methods of pore sizes distribution histograms. On this purposes, apart from common Earth Mover’s Distance (Wasserstein Distance) metric, a new Mean Vector Distance (MVD) metric was designed and described in this paper.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The GIS industry relies heavily on manual efforts to build and maintain digital maps. This approach is timeconsuming and requires a sizable workforce not only for map-making but also for quality-checks that are required to resolve the potential errors resulting from manual digitization. With recent advancements in computer vision, several organizations are using machine-learning algorithms to generate map data from images. In the current machine learning based geometry creation process three limitations prevails. Firstly, the output of the algorithms is never served on-demand to a map editing tool. Secondly, after being further fine-tuned manually by annotators/validators, the results are never fed back to the algorithms to identify the errors incurred and improve accuracy. Finally, a lot of manual effort is required to create training data for new terrains and regions. We propose an end-to-end machine learning system integrated with current map-making tools to address these limitations and reduce the manual effort in creating and updating geometry.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The paper presents a novel method for suppression of the orthotropic stripe artifacts typical for sensitive optical detector arrays. The algorithm is based on the guided filtering technique where the guidance image is constructed from the input frame in a way that removes artifacts from local contrast structures while disregarding the low-frequency distortions. The artifact suppression procedure was applied to the images of human faces taken with the IR -- THz camera in the diagnosis of psycho-emotional states. In this case, the presence of orthotropic artifacts prevents digital image stabilization. We also demonstrated that adaptation of the alg
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Usage of common reconstruction algorithms like Filtered Back Projection and Algebraic Reconstruction Technique to the projection data acquired with poly-chromatic probing radiation leads to the appearance of a cup-like distortion of the value profile in reconstructed images. While many methods of the poly-chromatic probing artifacts suppression are suggested, the numerical estimation algorithm of the “Cupping effect” typically is not considered to be important. Described methods imply manual regions selection where the intensity will be compared, or just use experts’ opinion on the effect presence. In this paper, we suggest automatic estimation of the “Cupping effect” method based on utilizing the distance transform built using the objects mask. As a result, we obtain a numeric estimation of the intensity change from the border to the center of the object. As the final image index, a weighted sum of the ratings of all objects is used. While positive value shows the magnitude of the “Cupping effect”, a negative value, on the contrary, shows magnitude of the reverse “Cupping effect”. In the paper, we demonstrate the method used on simulated data and compare it with several different techniques for distortion evaluation due to poly-chromatic probing. Finally, we show method effectiveness on real data acquired with laboratory tomography.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a RANSAC-based algorithm for determining the axial rotation angle of an object from a pair of its tomographic projections. An equation is derived for calculating the rotation angle using one correct keypoints correspondence of two tomographic projections. The proposed algorithm consists of the following steps: keypoints detection and matching, rotation angle estimation for each point correspondence, outliers filtering with the RANSAC algorithm, finally, calculation of the desired angle by minimizing the re-projection error from the remaining correspondences. To validate the proposed method an experimental comparison against methods based on analysis of the distribution of the angles computed from all correspondences is conducted.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Scanning electron microscopes (SEM) are widely used to analyse the morphology of all kind of specimen providing high resolution image data. To overcome the two-dimensional limitation, a lot of effort has been put into the recovery of the hidden third dimension based on the acquired SEM images throughout the last decades. Especially methods based on photogrammetry were identified to yield qualitatively good reconstruction results. Nevertheless, precise quantitative 3D measurements still remain a challenge. One of the key problems is the robust estimation of the motion in the acquired image sequences. A possible solution is given by the factorization method for orthographic image streams. To evaluate the applicability of this algorithm on SEM image sequences, the motion for several sequences is estimated and compared to the stage settings. Furthermore, the method is extended to obtain a dense reconstruction from a stereo-pair based on the estimated rotations between the two views. The final reconstruction results are compared to reference measurements with a confocal laser scanning microscope for a quantitative evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fast mesh compression is becoming a requisite in several applications such as medical imaging and video games. Graphics Processing Units (GPUs) are recently becoming massively parallel devices for Single Instruction, Multiple Data (SIMD) computing, addressing hence greater implementation challenges. Transformation and Quantization (TQ) is considered the second highest workload part of the wavelet-based mesh coding. Therefore, its acceleration will further improve the overall processing speed of the coding. In this paper, an OpenCL (Open Computing Language) acceleration of TQ is proposed. The Butterfly Wavelet Transform (BWT) based on the unlifted scheme is adopted in the transformation method while the embedded deadzone quantization is employed for the wavelet quantization. A chunk rearrangement process is applied for the computation of the neighborhood information needed for the Butterfly subdivision stencils. Accordingly, every chunk proceeds independently the prediction of the wavelet coefficients and their quantization. The key insights behind the proposed TQ method on GPU are a smart memory management and an efficient memory data mapping. Extensive experimental assessments demonstrate the effectiveness of our GPU implementation in terms of memory and runtime costs while preserving the rate distortion performance of the state-ofthe-art Bitplane coder.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We examined the use of modern Generative Adversarial Networks to generate novel images of oil paintings using the Painter By Numbers dataset. We implemented Spectral Normalization GAN (SN-GAN), and compared its outputs to a Deep Convolutional GAN. Visually, and quantitatively according to the Sliced Wasserstein Distance metric, we determined that the SN-GAN produced paintings that were most comparable to our training dataset. We then performed a series of experiments to add supervised conditioning to SN-GAN, the culmination of which is what we believe to be a novel architecture that can generate face paintings with user-specified characteristics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Large scale stencil images used for surface mount technology (SMT) always have more than ten thousand closed graphics(stencil holes). It is difficult to find corresponding information from those graphics in stencil image registration. Here, we propose a novel method which is based on two-node tree, differed from traditional ones. The two-node tree is special, which has only two nodes in a layer. It functions as selecting feature points. The set of feature points with the erroneous can find the most reasonable projection transformation model by the simplified RANSAC algorithm. We adopt different types of defective stencil images to verify the proposed method. Experimental results fully show its robustness and high-tolerant rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we present an approach for the deformable registration of 3D data via an RGB-D camera to reduce depth distortions in featureless regions. We employ the established PWC-Net based Optical Flow algorithm to identify pixel correspondence between nearby frames and then densely and uniformly select transformation nodes. Color correspondence of the transformation nodes is used in both global and local deformations. Several experimental results show that the proposed method results in low distortion during the non-rigid registration of multiple RGB-D images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposed a patch-based inpainting algorithm for depth map reconstruction using a stereo pair image. The proposed approach is based on a geometric model for patch synthesis. The lost pixels recovered by copying pixel values from the source based on a similarity criterion. We used a trained neural network to choose “best similar” patch. Experimental results show that the proposed method provides better results than the state-of-the-art methods in both subjective and objective measurements for depth map reconstruction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
According to the principle of scale transformation, signal expansion in the time domain corresponds to compression in the frequency domain, so that information energy is concentrated in the low frequency part. In this paper, an image enlargement algorithm based on the Discrete Cosine Transform (DCT) is proposed, which preserves the low frequency of the image and combines the corresponding enhancement coefficients to realize the resizing operation in the DCT domain. This paper also reach to proving of the enhancement value determinant. Then comparing the experiment on the scaled image with other interpolation algorithms, the result shows that our algorithm performs better than other methods. This method can also be carried out during DCT transformation, and is easier to implement than other methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this work the methods of comparison of digitized copies of administrative documents were considered. This problem arises, for example, when comparing two copies of documents signed by two parties in order to find possible modifications made by one party, in the banking sector at the conclusion of contracts in paper form. The proposed method of document image comparison is based on a combination of several ways of image comparison of words that are descriptors of text feature points. Testing was conducted on public Payslip Dataset (French). The results showed the high quality and the reliability of finding differences in two images that are versions of the same document.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces a specific type of aerospace image interpretation (AII), which is called as multifractal interpretation (MI) and provides the identification and description of natural objects on aerospace images (AIs) by their multifractal analysis (MA). The paper also presents a generalization of standard (moment-based) multifractal formalism (SMF), which can be considered as a theoretical basis of MI. This generalized multifractal formalism (GMF) is based on the use of kernels constructed using discrete orthonormal polynomials (OPs). It is shown that proposed GMF, in contrast to SMF, can be used to obtain one-dimensional (1D) spectra of global scaling exponents, spectra of local scaling exponents and firstly introduced two-dimensional (2D) spectra of global scaling exponents. The last part of the paper is devoted to the proposed MI methodology that includes MA based on GMF as the main block of the methodology.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given the urgent priority around protecting the forests and limiting the impacts of the climate change, the constant monitoring of forests towards the achievement of accurate and timely detection of infestations and the catastrophic action of invasive insects, pests and fungi is an important and challenging task. More precisely, new species of insects that are introduced or already existing insect species whose population multiply uncontrollably into the forest area, affect tree growth, their survival, as well as the quality of forest biomass and constitute a serious threat to the mechanisms of such forest ecosystems. Thus, new concepts are needed that will overcome difficulties faced by existing remote sensing techniques and that would allow the timely and accurate health determination process of forest regions, assisting scientists and authorities to take action in order to protect the forests. In this paper, we propose a monitoring approach, which uses high resolution RGB aerial images and combines different Region Convolution Neural Networks (R-CNNs) architectures, namely Faster R-CNN and Mask R-CNN and fuses their bounding box outcomes in order to more accurately localize candidate infected trees’ regions whilst increasing the number of the candidate trees that have been detected as infected. Subsequently, the candidate detected trees are modelled through the higher order linear dynamical systems (h-LDS) and descriptors are extracted for each candidate region. Finally, the h-LDS descriptors are classified using an SVM classifier for the estimation of the infected trees. The study area includes parts of the suburban pine forest of Thessaloniki city (Greece) named Seich Sou, which suffers the last months an infestation of high significance and intensity by a bark and wood destroying insect (Tomicus piniperda). Although this insect was recorded in the specific ecosystem many years ago, its population increased uncontrollably after the degradation of the ecosystem due to human intervention and lack of protection and management strategy. Experimental results, through their outperforming existing state-of-the-art algorithms, demonstrate high potential and perspectives of the proposed methodology of low cost and time consumed, to contribute to the sustainable management, protection and recovery of a forest ecosystem.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sign Language Recognition (SLR) has become an appealing topic in modern societies because such technology can ideally be used to bridge the gap between deaf and hearing people. Although important steps have been made towards the development of real-world SLR systems, signer-independent SLR is still one of the bottleneck problems of this research field. In this regard, we propose a deep neural network along with an adversarial training objective, specifically designed to address the signer-independent problem. Concretely speaking, the proposed model consists of an encoder, mapping from input images to latent representations, and two classifiers operating on these underlying representations: (i) the signclassifier, for predicting the class/sign labels, and (ii) the signer-classifier, for predicting their signer identities. During the learning stage, the encoder is simultaneously trained to help the sign-classifier as much as possible while trying to fool the signer-classifier. This adversarial training procedure allows learning signer-invariant latent representations that are in fact highly discriminative for sign recognition. Experimental results demonstrate the effectiveness of the proposed model and its capability of dealing with the large inter-signer variations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, an unsupervised registration approach based on possibility theory, called "Unsupervised Possibilistic registration", is proposed to encounter this problem. It consists on adding an unsupervised projection step that allows matching possibility maps, obtained from the two images instead of the grey-level images (knowing that the thematic classes and their number have no effect on the registration). The experiments and the comparative study using MRI images have shown promising results. It is shown that the proposed unsupervised registration approach overcomes major problems of existing methods and allows temporal complexity optimization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Character segmentation is one of the crucial problems of modern text line recognition methods. In this paper, we propose a per-character segmentation method based on the light weight convolutional neural network (CNN) which is suitable for on-premise applications for various mobile devices. The distinctive feature of our method is that it provides the coordinates of the start and end points of each character, not the coordinates of the “cut” between two characters. It allows us to utilize known geometrical properties of glyphs efficiently. Consequently, the target character images are not flawed because of characters intersections or wide spaces. We present the results measured for text lines with various letter spacing. Results illustrate that the proposed method decreases the segmentation error rate for the majority of test datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a method for QR Code localization on images obtained under uncontrolled environment is presented. The proposed method is a modified Viola-Jones object detection method in which features are calculated over the directional edge image, and a tree classifier is used instead of cascade classifier. The experiments show that the use of the QR Code localization method described in the paper can significantly improve the quality of the existing decoding algorithms. The high performance of the developed method makes it possible to use it in various real-time recognition systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image registration requires a step of detection and matching of primitives. This phase is important to obtain reliable registration. In this paper, we mainly focus on geometric registration methods which are based on the extraction and matching of distinctive feature points in images. Several methods such as SIFT, SURF, BRIEF, BRISK, ORB, FREAK and FRIF, are already proposed. In this paper, we present a comparative study of feature detector and descripts methods for registration which can be classified according to the type of descriptor and can be local classical or binary. We have presented, through this study, the difference between geometric methods of descriptor leveling as well as points of interest detector used and which have an influence on the resetting registration result. We can see that each method has weak points as well as strong points. The major difference is the level of invariance to the type of processing and the temporal complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most modern convolutional neural networks (CNNs) are compute-intensive, making them infeasible to use in mobile or embedded devices. One of the approaches to this problem is to modify a usual deep CNN with shallow early-exit branches, appended to some convolutional layers [1]. This modification, named BranchyNet, allows to process simple input samples without performing full volume of calculations, providing a speed-up on average. In this work we consider the problem of training a BranchyNet. We exploit a cascade loss function [2], which explicitly regularizes CNN’s average computation time, and modify it to use the entropy of branches’ prediction as confidence measure. We show, that on CIFAR10 dataset the proposed loss function provides a actual speed-up increase from 43% to 47% without quality degradation, comparing with the original loss function.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the paper we introduce a novel bipolar morphological neuron and bipolar morphological layer models. The models use only such operations as addition, subtraction and maximum inside the neuron and exponent and logarithm as activation functions for the layer. The proposed models unlike previously introduced morphological neural networks approximate the classical computations and show better recognition results. We also propose layer-by-layer approach to train the bipolar morphological networks, which can be further developed to an incremental approach for separate neurons to get higher accuracy. Both these approaches do not require special training algorithms and can use a variety of gradient descent methods. To demonstrate efficiency of the proposed model we consider classical convolutional neural networks and convert the pre-trained convolutional layers to the bipolar morphological layers. Seeing that the experiments on recognition of MNIST and MRZ symbols show only moderate decrease of accuracy after conversion and training, bipolar neuron model can provide faster inference and be very useful in mobile and embedded systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Movie recommendation systems have become ubiquitous in most sides of our lives. Currently, they are far from optimal. This paper presents a movielense recommendation system based on machine learning through utilizing the deep convolutional network and depending on generative modeling of public previous aspects mixtures. The objective of this paper is to introduce such a recommendation system to help users in selecting datasets of movies according to certain pre-specified measurements and data. The applied methodology is pivoted on implementing the system by using different sentimental analysis algorithms. These algorithms are keen to provide a solution for the full stack developers through using a trained model using their datasets. This will give suggestions based on their previous activity or recommended by other users’ interests demonstrated on their website. Thus to help users visualize their interest or to form the better scope of visualization. The presented system has proved better results concerning accuracy and efficiency in comparison with some other similar works. When experimentations on both real and synthetic datasets were conducted, the system showed percentile improvement of about 91.07%in the training dataset and 93.49%in the testing dataset respectively. This system is convenient for several application fields like time series network visualization, business process modeling, various data mining applications, e-commerce websites, besides most online platforms that people use including social media.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many applications in computer vision require calibrated cameras, but identifying camera calibration parameters is a tedious task. Common methods require custom-built calibration patterns from which many images from different perspectives have to be taken. This research introduces a novel auto calibration method to reduce the work to a minimum. The method utilizes a neural network framework and learns the parameters through backpropagation and gradient descent. Three views of the same arbitrarily textured flat surface are used as an input. Two of the views are transformed to match the third reference view by plane homographies. Feature maps are extracted and the views are compared with their help. In- and extrinsic, as well as distortion parameters can then be learned by maximizing the similarity between the transformed views and the reference view. The results show that the method is able to find the calibration parameters of artificially distorted images. Results with real camera images are comparable to common methods that require planar calibration patterns, which makes the proposed method a quick alternative.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Steganography is collection of methods to hide secret information (“payload”) within non-secret information “container”). Its counterpart, Steganalysis, is the practice of determining if a message contains a hidden payload, and recovering it if possible. Presence of hidden payloads is typically detected by a binary classifier. In the present study, we propose a new model for generating image-like containers based on Deep Convolutional Generative Adversarial Networks (DCGAN). This approach allows to generate more setganalysis-secure message embedding using standard steganography algorithms. Experiment results demonstrate that the new model successfully deceives the steganography analyzer, and for this reason, can be used in steganographic applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Data science becomes creative with generative adversarial networks (GANs) which have had a big success since they were introduced in 2014 by Ian J. Goodfellow and co-authors. In technical term the GANs are based on the unsupervised learning of two artificial neural networks called Generator and Discriminator both trained under the adversarial learning idea. The major goal of GAN is to generate new samples that estimate the potential distribution of real data. Due to its huge success, many modified versions have been proposed in the last two years. We summarize in this review paper GAN’s background, architecture and its application fields. Then, we discuss the different extensions of GAN over the original model and provide a comparative analysis of these techniques.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Nowadays, various fonts are applied in many fields, and the generation of multiple fonts by computer plays an important role in the inheritance, development and innovation of Chinese culture. Aiming at the existing font generation methods, which have some problems such as stroke deletion, artifact and blur, this paper proposes Chinese font translation with improved wasserstein generative adversarial network. The wasserstein distance is used to measure distance and difference between the two distributions, and the gradient penalty mechanism is used instead of the weight clipping. The residual dense blocks with better flexibility are selected as the core component to extract the features fully and enhance the information transmission between network layers. It realizes the style migration between different Chinese fonts. Experiments show that the proposed method has better performance in font generation details, simplifies the font translation process, and improves the fidelity of font generation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.