Anomaly detection aims to find different patterns from those seen previously. It is usually regarded as a oneclassification problem where abnormal classes are often scarce or not well defined, while the target class (training objects) are sufficient. Recently, several methods have achieved excellent performance through an auxiliary multi-class task(such as rotation predict) used in self-supervised learning. However, these classification-based approaches which adopt the crossentropy loss have inherent defects in anomaly detection. Specifically, the relative measure of cross-entropy may result that a normal sample suffered from a low score would be misclassified as an abnormality. In order to solve this problem, we propose an Absolute Measurement Anomaly Detection (AMAD), to constrain the distribution of activations of each input in the classification network. In details, this technique encourages the output of the ground truth class to be higher, vice versa for unrelated classes. Furtherly, different from the previous evaluation methods that counting the log-softmax activation of the model as normality score, we ignore the log-softmax since the score would be affected heavily provided that more misclassification occurs. We present experiments both in image datasets(CIFAR-10, Fashion-MNIST) and tabular datasets(KDDCUP et al.), which show that our technique achieves better performance in terms of AUROC and F1 score when compared with previous similar methods.
In this paper, hybrid network based on convolutional neural network (CNN) and long short-term memory network (LSTM) is proposed to improve hand gesture recognition accuracy. Distinguish from the large number of traditional surface electromyography (sEMG) features proposed by previous researchers, without involving a lot of manual design and professional domain knowledge, this hybrid network can automatically extract both spatial features and temporal features from the input sEMG signals. The hybrid CNN-LSTM Network has two parallel feature extraction stages: spatial features extraction using CNN and temporal features extraction using LSTM. The hybrid CNN-LSTM network combines spatial features and temporal features as Hybrid features (HybridFeat) and feeds HybridFeat into traditional classifiers, including linear discriminant analysis (LDA), support vector machine (SVM) and K nearest neighbor (KNN). The experiments showed that both in inter-session scenario and inter-subject scenario, the HybridFeat outperforms all the tested traditional features and CNNFeat. Besides, it was found that combining HybridFeat with traditional features can further improve the accuracy.
Recently channel attention mechanism has been widely used to improve the performance of convolutional neural networks. However, most channel attention mechanisms applied to the backbone convolutional neural networks of the computer vision use the global pooling features of the output from each block to obtain the channel attention weights of corresponding channels, ignore the spatial information of the corresponding original features and the potential relationship between adjacent layers. For insufficient utilization of space information of origin features and inability to adaptively learn the potential association of all features in a block before the process of producing channel attention weights, we propose a new Cross-layer Channel Attention Mechanism(CCAM), in which a matrix with spatial information is used to replace the global pooling operation, uses the input and output features of each block as the inputs, and outputs the channel attention weights of corresponding features simultaneously. Compared with other attention mechanisms, the CCAM have the following three advantages: first, it makes full use of the spatial information of each layer of features; second, it encourages feature reuse and fusion; third, it is better at discovering the potential relationship between the features of different layers in a block. Our simulation results have demonstrated that CCAM can effectively extract the attention weights of diffident layers, and achieve better performance on CIFAR- 10, CIFAR-100, ImageNet-1K, MS COCO detection, and VOC detection with small additional computational cost compared with the corresponding convolutional neural network.
Orthogonal matching pursuit (OMP) has gained remarkable achievements in the domain of Sparse Subspace Clustering (SSC) for image clustering. However, current methods based on OMP improves the clustering accuracy by adding additional operations, which increase computational complexity. In this paper, a novel SSC algorithm with one-way selective orthogonal matching pursuit (SSC-OWSOMP) is proposed to improve the clustering accuracy without increasing the computational complexity in the SSC-OMP-based methods. In our SSC-OWSOMP, a one-way selective module is designed to avoid mutual selection among data points, which can enrich the information used for clustering without adding additional operations. Experimental results demonstrate that, with the SSC-OWSOMP, not only the clustering accuracy can be improved but also the time complexity be kept, also the SSC-OWSOMP is suitable for the data sets with high sample density.
Recent deep learning approaches have shown significant improvements in the challenging task of image inpainting. However, these methods may generate blurry output and distorted textures. In this paper, we propose an efficient end-to-end two-stage network for image inpainting. In the coarse stage, we employ residual dense block (RDB) as well as short and long skip connections to fully leverage and exploit features from all convolutional layers and give a globally rough reconstruction. In the refinement stage, we propose a local and global residual network based on channel and spatial attention block (CSAB) to adaptively weigh both channel-wise and spatial-wise features focusing on more meaningful information, and generate a locally fine-detailed image. Experiments on Paris StreetView and DTD textures demonstrate the effectiveness and efficiency of our method. Results show that our method outperforms the baseline techniques quantitatively and qualitatively.
Model compression reduces the computation costs of an over-parameterized network without performance damage, and channel pruning is among the predominant approaches to compress deep neural networks. In this paper, we propose a novel approach called MetaAMC combining meta learning and Auto ML for automatic channel pruning of very deep neural networks. It leverages meta learning and reinforcement learning to provide the model compression policy. Compared to not only the original AMC and MetaPruning but also the state-of-the-art pruning methods, we have demonstrated superior performances on MobileNet V1/V2 and ResNet-50.
Hashing is an important technique branch of image retrieval due to its satisfactory retrieval performance with high retrieval speed and low storage cost. Deep supervised hashing methods, which take advantage of the convolutional neural network and the supervised information, have shown better performance than other kinds of hashing methods. However, previous deep hashing methods do not consider the noisy data, which generally exist in large-scale labeled datasets and mislead the learning algorithm. In this paper, we propose a novel robust deep supervised hashing (RDSH) method, in which a robust pairwise loss and a quantitation loss are used to supervise the learning procedure. The quantitation loss guides the CNN to output binary codes. The robust pairwise loss for similarity-preserving learning is designed based on generalizations of the exponential and logarithmic functions. By adjusting its parameters, the robust pairwise loss can exhibit special properties, including tail-heaviness and boundedness, which results in the robustness of the learning procedure to noisy training data. To verify the robustness of the RDSH, we conduct experiments on CIFAR10 with different noisy levels, in which RDSH shows better robustness than other deep supervised hashing methods. Experiments on standard CIFAR-10 and NUS-WIDE datasets show that RDSH outperforms other baselines.
Mice interaction recognition has been widely employed in animal observation experiments since it can provide useful biological signals for researchers to identify the social behaviors of animal models. Rather than manual observation, automatic observation systems based on computer vision can be used to detect and track the mice behaviors dynamically, which has become popular in mice interaction recognition. In order to enhance the accuracy of recognition, a novel mice interaction recognition method by using machine learning is proposed in this paper. A new elliptical model is developed to fit every mouse and to extract its motion and shape features. For selecting the optimal feature, we investigate the influence of different features on the result. We improve the average recognition result on the novel dataset RatSI.
The facial expression reenacted forgery (FERF) is a very complicated and meticulous video tampering type compared to other video tampering types, such as the simple copy-paste of frames or objects. The best results of FERF can make the target actor’s facial expressions follow the changes of the source actor’s in real time. Existing video tampering detection methods aim at detecting simple tampering type, like intra-frame or inter-frame forgery, which function little on the detection of FERF. In this paper, a novel video forgery detection method is proposed to detect FERF. Through the attentive analysis of the general progress of FERF, some abnormal subtle changes in facial region is exposed and utilized to verify the authenticity of videos. Moment features of detailed wavelet transform coefficients and optical-flow features of the videos are combined as feature vectors put into Support Vector Machine (SVM) for the classification of original videos and forged ones. The experimental results show that the proposed method is effective on the detection of FERF. We also compare our results with previous popular copy-paste forgery detection algorithms.
The watermarking technique can be used to protect the copyright of relational databases by hiding the ownership information into the relational databases. Difference expansion (DE) technique is one of the common reversible watermarking techniques for numerical relational databases. However, most previous schemes based on DE suffer the problem of low embedding capacity when the difference values between different attributes are relatively large. In this paper, we propose a novel reversible watermarking scheme to solve the above problem. In the scheme, a mapping difference expansion (MDE) method is proposed to convert the differences between attributes to small mapping differences. Based on the MDE, an attribute and tuple selection algorithm is designed to select the suitable data for watermarking, which can increase embedding capacity and reduce distortion. In addition, the majority voting technique is utilized to enhance the robustness of watermarking with the high embedding capacity. The experimental results have shown that the proposed scheme can provide higher embedding capacity, lower distortion and stronger robustness than other schemes.
This study introduces a novel depth estimation method that can automatically generate plausible depth map from a single image with unstructured environment. Our goal is to extrapolate depth map with more correct, rich, and distinct depth order, which is both quantitatively accurate as well as visually pleasing. Based on the preexisting DepthTransfer algorithm, our approach primarily transfers depth information at the level of superpixels from the most photometrically similar retrieval images under the framework of non-parametric learning. Posteriorly, we propose to concurrently warp the corresponding superpixels in multi-scale levels, where we employ an improved SLIC technique to segment the RGBD images from coarse to fine. Then, modified Cross Bilateral Filter is leveraged to refine the final depth field. With respect to training and evaluation, we perform our experiment on the popular Make3D dataset and demonstrate that our method outperforms the state-of-the-art in both efficacy and computational efficiency. Especially, the final results show that in qualitatively evaluation, our results are visually superior in realism and simultaneously more immersive.
Digital watermarking has been recognized as a useful technology for the copyright protection and authentication of digital information. However, rarely did the former methods focus on the key content of digital carrier. The idea based on the protection of key content is more targeted and can be considered in different digital information, including text, image and video. In this paper, we use text as research object and a text zero-watermarking method which uses keyword dense interval (KDI) as the key content is proposed. First, we construct zero-watermarking model by introducing the concept of KDI and giving the method of KDI extraction. Second, we design detection model which includes secondary generation of zero-watermark and the similarity computing method of keyword distribution. Besides, experiments are carried out, and the results show that the proposed method gives better performance than other available methods especially in the attacks of sentence transformation and synonyms substitution.
In this paper, a new robust multiple description image coding method with a modified interleaving sampling and a modified interpolation method using block compressed sensing is proposed. In the encoding process, the original image is decomposed into several sub-images by using the modified interleaving sampling and the redundant bits are added to enhance the reconstruction accuracy. For each sub-image the description is obtained in the block compressed sensing (BCS). In the decoding process, the signal is reconstructed from the sparse measurements by using the optimization algorithm. Our analysis and simulation results showed that the proposed method is a balanced multiple description coding scheme with higher accuracy of reconstruction and higher efficiency of coding.
KEYWORDS: Digital watermarking, Distortion, Feature extraction, Image compression, Signal processing, Image quality, Discrete wavelet transforms, Digital filtering, Linear filtering, Information security
Digital watermarking is an efficient technique for copyright protection in the current digital and network era. In this paper, a novel robust watermarking scheme is proposed based on singular value decomposition (SVD), Arnold scrambling (AS), scale invariant feature transform (SIFT) and majority voting mechanism (MVM). The watermark is embedded into each image block for three times in a novel way to enhance the robustness of the proposed watermarking scheme, while Arnold scrambling is utilized to improve the security of the proposed method. During the extraction procedure, SIFT feature points are used to detect and correct possibly geometrical attacks, and majority voting mechanism is performed to enhance the accuracy of the extracted watermark. Our analyses and experimental results demonstrate that the proposed watermarking scheme is not only robust to a wide range of common signal processing attacks (such as noise, compression and filtering attacks), but also has favorable resistance to geometrical attacks.
Since web born-digital images have low resolution and dense text atoms, text region over-merging and miss detection are still two open issues to be addressed. In this paper a novel iterative algorithm is proposed to locate and segment text regions. In each iteration, the candidate text regions are generated by detecting Maximally Stable Extremal Region (MSER) with diminishing thresholds, and categorized into different groups based on a new similarity graph, and the texted region groups are identified by applying several features and rules. With our proposed overlap checking method the final well-segmented text regions are selected from these groups in all iterations. Experiments have been carried out on the web born-digital image datasets used for robust reading competition in ICDAR 2011 and 2013, and the results demonstrate that our proposed scheme can significantly reduce both the number of over-merge regions and the lost rate of target atoms, and the overall performance outperforms the best compared with the methods shown in the two competitions in term of recall rate and f-score at the cost of slightly higher computational complexity.
In this paper, a novel background subtraction approach is proposed to avoid stationary foreground objects being merged into the background in target detection and tracking, in which an improved background model is designed by using virtual frames and the blur can be attenuated with this model when an object moves again after it stays for a long time. Moreover, the proposed model is fused with the eigenbackgrounds to improve the environmental adaptability. Our experimental results indicate that the proposed approach enhances the performance of target detection and tracking in intelligent surveillance and is superior to some state-of-the-art methods according to the precision-recall measurement.
Load modeling is recognized as a difficult issue in field of power system digital simulation. The reliability of the
simulation results depends on the veracity of the load model which will further affect power system planning and aid
decision making. In order to increase the accuracy of the load model, the composite loads of power consuming-industries
were classified by their industry attributes and the components of them were also analyzed in this paper. Then, the
mathematical model of load composition is established on the basic of typical daily load profile and the identification
algorithm developed by C language is used to identify the parameters of composite loads by choosing the data collected
during the corresponding characteristic time period of the typical day. Based on the model vector machine theory and the
parameters identified, the parameters of composite load model of power consuming-industries can be calculated by using
the way of least square approximation. And the BP neural network was used to forecast the parameters of composite
loads of power consuming-industries. Finally, an example shows the validity of the proposed scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.