My Account Log in

1 option

Robust automatic speech recognition : a bridge to practical applications / Jinyu Li [and three others].

O'Reilly Online Learning: Academic/Public Library Edition Available online

View online
Format:
Book
Author/Creator:
Li, Jinyu, author.
Language:
English
Subjects (All):
Automatic speech recognition.
Speech processing systems.
Physical Description:
1 online resource (308 p.)
Edition:
1st edition
Place of Publication:
Amsterdam, Netherlands : Academic Press, 2016.
System Details:
text file
Summary:
Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications. The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided. The reader will: Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition Learn the links and relationship between alternative technologies for robust speech recognition Be able to use the technology analysis and categorization detailed in the book to guide future technology development Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years
Contents:
Front Cover
Robust Automatic Speech Recognition: A Bridge to Practical Applications
Copyright
Contents
About the Authors
List of Figures
List of Tables
Acronyms
Notations
Chapter 1: Introduction
1.1 Automatic Speech Recognition
1.2 Robustness to Noisy Environments
1.3 Existing Surveys in the Area
1.4 Book Structure Overview
References
Chapter 2: Fundamentals of speech recognition
2.1 Introduction: Components of Speech Recognition
2.2 Gaussian Mixture Models
2.3 Hidden Markov Models and the Variants
2.3.1 How to Parameterize an HMM
2.3.2 Efficient Likelihood Evaluation for the HMM
2.3.3 EM Algorithm to Learn the HMM Parameters
2.3.4 How the HMM Represents Temporal Dynamics of Speech
2.3.5 GMM-HMMs for Speech Modeling and Recognition
2.3.6 Hidden Dynamic Models for Speech Modeling and Recognition
2.4 Deep Learning and Deep Neural Networks
2.4.1 Introduction
2.4.2 A Brief Historical Perspective
2.4.3 The Basics of Deep Neural Networks
2.4.4 Alternative Deep Learning Architectures
Deep convolutional neural networks
Deep recurrent neural networks
2.5 Summary
Chapter 3: Background of robust speech recognition
3.1 Standard Evaluation Databases
3.2 Modeling Distortions of Speech in Acoustic Environments
3.3 Impact of Acoustic Distortion on Gaussian Modeling
3.4 Impact of Acoustic Distortion on DNN Modeling
3.5 A General Framework for Robust Speech Recognition
3.6 Categorizing Robust ASR Techniques: An Overview
3.6.1 Compensation in Feature Domain vs. Model Domain
3.6.2 Compensation Using Prior Knowledge about Acoustic Distortion
3.6.3 Compensation with Explicit vs. Implicit Distortion Modeling
3.6.4 Compensation with Deterministic vs. Uncertainty Processing.
3.6.5 Compensation with Disjoint vs. Joint Model Training
3.7 Summary
Chapter 4: Processing in the feature and model domains
4.1 Feature-Space Approaches
4.1.1 Noise-Resistant Features
Auditory-based features
Temporal processing
Neural network approaches
4.1.2 Feature Moment Normalization
Cepstral mean normalization
Cepstral mean and variance normalization
Histogram equalization
4.1.3 Feature Compensation
Spectral subtraction
Wiener filtering
Advanced front-end
4.2 Model-Space Approaches
4.2.1 General Model Adaptation for GMM
4.2.2 General Model Adaptation for DNN
Low-footprint DNN adaptation
Adaptation criteria
4.2.3 Robustness via Better Modeling
4.3 Summary
Chapter 5: Compensation with prior knowledge
5.1 Learning from Stereo Data
5.1.1 Empirical Cepstral Compensation
5.1.2 SPLICE
5.1.3 DNN for Noise Removal Using Stereo Data
5.2 Learning from Multi-Environment Data
5.2.1 Online Model Combination
Online model combination for GMM
Online model combination for DNN
5.2.2 Non-Negative Matrix Factorization
5.2.3 Variable-Parameter Modeling
Variable-parameter modeling for GMM
Variable-component DNN
5.3 Summary
Chapter 6: Explicit distortion modeling
6.1 Parallel Model Combination
6.2 Vector Taylor Series
6.2.1 VTS Model Adaptation
6.2.2 Distortion Estimation in VTS
6.2.3 VTS Feature Enhancement
6.2.4 Improvements over VTS
6.2.5 VTS for the DNN-Based Acoustic Model
6.3 Sampling-Based Methods
6.3.1 Data-Driven PMC
6.3.2 Unscented Transform
6.3.3 Methods Beyond the Gaussian Assumption
6.4 Acoustic Factorization
6.4.1 Acoustic Factorization Framework
6.4.2 Acoustic Factorization for GMM
6.4.3 Acoustic Factorization for DNN
6.5 Summary
References.
Chapter 7: Uncertainty processing
7.1 Model-Domain Uncertainty
7.2 Feature-Domain Uncertainty
7.2.1 Observation Uncertainty
Uncertainty propagation through multilayer perceptrons
7.3 Joint Uncertainty Decoding
7.3.1 Front-End JUD
7.3.2 Model JUD
7.4 Missing-Feature Approaches
7.5 Summary
Chapter 8: Joint model training
8.1 Speaker Adaptive and Source Normalization Training
8.2 Model Space Noise Adaptive Training
8.3 Joint Training for DNN
8.3.1 Joint Front-End and DNN Model Training
8.3.2 Joint Adaptive Training
8.4 Summary
Chapter 9: Reverberant speech recognition
9.1 Introduction
9.2 Acoustic Impulse Response
9.3 A Model of Reverberated Speech in Different Domains
9.4 The Effect of Reverberation on ASR Performance
9.5 Linear Filtering Approaches
9.6 Magnitude or Power Spectrum Enhancement
9.7 Feature Domain Approaches
9.7.1 Reverberation Robust Features
9.7.2 Feature Normalization
9.7.3 Model-Based Feature Enhancement
9.7.4 Data-Driven Enhancement
9.8 Acoustic Model Domain Approaches
9.9 The REVERB Challenge
9.10 To Probe Further
9.11 Summary
Chapter 10: Multi-channel processing
10.1 Introduction
10.2 The Acoustic Beamforming Problem
10.3 Fundamentals of Data-Dependent Beamforming
10.3.1 Signal Model and Objective Functions
10.3.2 Generalized Sidelobe Canceller
10.3.3 Relative Transfer Functions
10.4 Multi-Channel Speech Recognition
10.4.1 ASR on Beamformed Signals
10.4.2 Multi-Stream ASR
10.5 To Probe Further
10.6 Summary
Chapter 11: Summary and future directions
11.1 Robust Methods in the Era of GMM
11.2 Robust Methods in the Era of DNN
11.3 Multi-Channel Input and Robustness to Reverberation
11.4 Epilogue
Index
Back Cover.
Notes:
Description based upon print version of record.
Includes bibliographical references at the end of each chapters and index.
Description based on print version record.
ISBN:
9780128023983
0128023988
9780128026168
0128026162
OCLC:
929952677

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

Find

Home Release notes

My Account

Shelf Request an item Bookmarks Fines and fees Settings

Guides

Using the Find catalog Using Articles+ Using your account