1 option

Robust automatic speech recognition : a bridge to practical applications / Jinyu Li [and three others].

O'Reilly Online Learning: Academic/Public Library Edition Available online

Format:: Book
Author/Creator:: Li, Jinyu, author.
Language:: English
Subjects (All):: Automatic speech recognition.; Speech processing systems.
Physical Description:: 1 online resource (308 p.)
Edition:: 1st edition
Place of Publication:: Amsterdam, Netherlands : Academic Press, 2016.
System Details:: text file
Summary:: Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications. The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided. The reader will: Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition Learn the links and relationship between alternative technologies for robust speech recognition Be able to use the technology analysis and categorization detailed in the book to guide future technology development Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years
Contents:: Front Cover; Robust Automatic Speech Recognition: A Bridge to Practical Applications; Copyright; Contents; About the Authors; List of Figures; List of Tables; Acronyms; Notations; Chapter 1: Introduction; 1.1 Automatic Speech Recognition; 1.2 Robustness to Noisy Environments; 1.3 Existing Surveys in the Area; 1.4 Book Structure Overview; References; Chapter 2: Fundamentals of speech recognition; 2.1 Introduction: Components of Speech Recognition; 2.2 Gaussian Mixture Models; 2.3 Hidden Markov Models and the Variants; 2.3.1 How to Parameterize an HMM; 2.3.2 Efficient Likelihood Evaluation for the HMM; 2.3.3 EM Algorithm to Learn the HMM Parameters; 2.3.4 How the HMM Represents Temporal Dynamics of Speech; 2.3.5 GMM-HMMs for Speech Modeling and Recognition; 2.3.6 Hidden Dynamic Models for Speech Modeling and Recognition; 2.4 Deep Learning and Deep Neural Networks; 2.4.1 Introduction; 2.4.2 A Brief Historical Perspective; 2.4.3 The Basics of Deep Neural Networks; 2.4.4 Alternative Deep Learning Architectures; Deep convolutional neural networks; Deep recurrent neural networks; 2.5 Summary; Chapter 3: Background of robust speech recognition; 3.1 Standard Evaluation Databases; 3.2 Modeling Distortions of Speech in Acoustic Environments; 3.3 Impact of Acoustic Distortion on Gaussian Modeling; 3.4 Impact of Acoustic Distortion on DNN Modeling; 3.5 A General Framework for Robust Speech Recognition; 3.6 Categorizing Robust ASR Techniques: An Overview; 3.6.1 Compensation in Feature Domain vs. Model Domain; 3.6.2 Compensation Using Prior Knowledge about Acoustic Distortion; 3.6.3 Compensation with Explicit vs. Implicit Distortion Modeling; 3.6.4 Compensation with Deterministic vs. Uncertainty Processing.; 3.6.5 Compensation with Disjoint vs. Joint Model Training; 3.7 Summary; Chapter 4: Processing in the feature and model domains; 4.1 Feature-Space Approaches; 4.1.1 Noise-Resistant Features; Auditory-based features; Temporal processing; Neural network approaches; 4.1.2 Feature Moment Normalization; Cepstral mean normalization; Cepstral mean and variance normalization; Histogram equalization; 4.1.3 Feature Compensation; Spectral subtraction; Wiener filtering; Advanced front-end; 4.2 Model-Space Approaches; 4.2.1 General Model Adaptation for GMM; 4.2.2 General Model Adaptation for DNN; Low-footprint DNN adaptation; Adaptation criteria; 4.2.3 Robustness via Better Modeling; 4.3 Summary; Chapter 5: Compensation with prior knowledge; 5.1 Learning from Stereo Data; 5.1.1 Empirical Cepstral Compensation; 5.1.2 SPLICE; 5.1.3 DNN for Noise Removal Using Stereo Data; 5.2 Learning from Multi-Environment Data; 5.2.1 Online Model Combination; Online model combination for GMM; Online model combination for DNN; 5.2.2 Non-Negative Matrix Factorization; 5.2.3 Variable-Parameter Modeling; Variable-parameter modeling for GMM; Variable-component DNN; 5.3 Summary; Chapter 6: Explicit distortion modeling; 6.1 Parallel Model Combination; 6.2 Vector Taylor Series; 6.2.1 VTS Model Adaptation; 6.2.2 Distortion Estimation in VTS; 6.2.3 VTS Feature Enhancement; 6.2.4 Improvements over VTS; 6.2.5 VTS for the DNN-Based Acoustic Model; 6.3 Sampling-Based Methods; 6.3.1 Data-Driven PMC; 6.3.2 Unscented Transform; 6.3.3 Methods Beyond the Gaussian Assumption; 6.4 Acoustic Factorization; 6.4.1 Acoustic Factorization Framework; 6.4.2 Acoustic Factorization for GMM; 6.4.3 Acoustic Factorization for DNN; 6.5 Summary; References.; Chapter 7: Uncertainty processing; 7.1 Model-Domain Uncertainty; 7.2 Feature-Domain Uncertainty; 7.2.1 Observation Uncertainty; Uncertainty propagation through multilayer perceptrons; 7.3 Joint Uncertainty Decoding; 7.3.1 Front-End JUD; 7.3.2 Model JUD; 7.4 Missing-Feature Approaches; 7.5 Summary; Chapter 8: Joint model training; 8.1 Speaker Adaptive and Source Normalization Training; 8.2 Model Space Noise Adaptive Training; 8.3 Joint Training for DNN; 8.3.1 Joint Front-End and DNN Model Training; 8.3.2 Joint Adaptive Training; 8.4 Summary; Chapter 9: Reverberant speech recognition; 9.1 Introduction; 9.2 Acoustic Impulse Response; 9.3 A Model of Reverberated Speech in Different Domains; 9.4 The Effect of Reverberation on ASR Performance; 9.5 Linear Filtering Approaches; 9.6 Magnitude or Power Spectrum Enhancement; 9.7 Feature Domain Approaches; 9.7.1 Reverberation Robust Features; 9.7.2 Feature Normalization; 9.7.3 Model-Based Feature Enhancement; 9.7.4 Data-Driven Enhancement; 9.8 Acoustic Model Domain Approaches; 9.9 The REVERB Challenge; 9.10 To Probe Further; 9.11 Summary; Chapter 10: Multi-channel processing; 10.1 Introduction; 10.2 The Acoustic Beamforming Problem; 10.3 Fundamentals of Data-Dependent Beamforming; 10.3.1 Signal Model and Objective Functions; 10.3.2 Generalized Sidelobe Canceller; 10.3.3 Relative Transfer Functions; 10.4 Multi-Channel Speech Recognition; 10.4.1 ASR on Beamformed Signals; 10.4.2 Multi-Stream ASR; 10.5 To Probe Further; 10.6 Summary; Chapter 11: Summary and future directions; 11.1 Robust Methods in the Era of GMM; 11.2 Robust Methods in the Era of DNN; 11.3 Multi-Channel Input and Robustness to Reverberation; 11.4 Epilogue; Index; Back Cover.
Notes:: Description based upon print version of record.; Includes bibliographical references at the end of each chapters and index.; Description based on print version record.
ISBN:: 9780128023983; 0128023988; 9780128026168; 0128026162
OCLC:: 929952677

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

Robust automatic speech recognition : a bridge to practical applications / Jinyu Li [and three others].

Find

My Account

Guides