Voice is the main way of communication and information sharing with others, It brings great convenience to human life. The existing speech recognition classification has the problem of considerable performance attenuation to environment noise and accent. Most of these problems can be mitigated by training on large amounts of data. However, collecting large Numbers of high-quality datasets in real life is time-consuming and expensive. In order to solve this problem, this paper proposes a data enhancement method,which is suitable for the speech image extension of small samples. S-GAN is used to generate datasets that conform to the real distribution of samples, and GAN-train and GAN-test methods are used to evaluate the quality and diversity of network generated images. Meanwhile, the spatial transformation network (STN) and CNN framework are combined to get the useful information part of the data for data classification. The results show that this method can significantly improve the classification accuracy of speech recognition and lay a foundation for small sample data enhancement.
In this work, we propose an efficient method for accurately estimating the scene layout in both outdoor and indoor scenarios. For outdoor scenes, the horizon line in a road image is estimated while for indoor scenes, the wall-wall, wallceiling and wall-floor edges are estimated. A number of image patches are first cropped from the image and then feed into a convolution neural network which is originally trained for object detection. The yielded deep features from three different layers are compared with the features of the training patches, in a spatial-aware hashing fashion. The horizon line is then estimated via a sophisticated voting stage in which different voters are considered differently according to their importances. In particular, for the more complex labels (in indoor scenes), we introduce the structural forest for further enhancing the deep features before learning the hashing function. In practice, the proposed algorithm outperforms the state-of-the-art methods in accuracy for outdoor scenes while achieves the comparable performance to the best indoor scene layout estimators. Further more, the proposed method is real-time speed (up to 25 fps).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.