1 option
Digital Speech Transmission and Enhancement / Peter Vary and Rainer Martin.
- Format:
- Book
- Author/Creator:
- Vary, Peter, author.
- Martin, Rainer, author.
- Series:
- IEEE Press Series
- Language:
- English
- Subjects (All):
- Digital communications.
- Physical Description:
- 1 online resource (595 pages)
- Edition:
- Second edition.
- Place of Publication:
- Hoboken, NJ : John Wiley & Sons Ltd., [2024]
- Summary:
- DIGITAL SPEECH TRANSMISSION AND ENHANCEMENT Enables readers to understand the latest developments in speech enhancement/transmission due to advances in computational power and device miniaturization The Second Edition of Digital Speech Transmission and Enhancement has been updated throughout to provide all the necessary details on the latest advances in the theory and practice in speech signal processing and its applications, including many new research results, standards, algorithms, and developments which have recently appeared and are on their way into state-of-the-art applications. Besides mobile communications, which constituted the main application domain of the first edition, speech enhancement for hearing instruments and man-machine interfaces has gained significantly more prominence in the past decade, and as such receives greater focus in this updated and expanded second edition. Readers can expect to find information and novel methods on: Low-latency spectral analysis-synthesis, single-channel and dual-channel algorithms for noise reduction and dereverberation Multi-microphone processing methods, which are now widely used in applications such as mobile phones, hearing aids, and man-computer interfaces Algorithms for near-end listening enhancement, which provide a significantly increased speech intelligibility for users at the noisy receiving side of their mobile phone Fundamentals of speech signal processing, estimation and machine learning, speech coding, error concealment by soft decoding, and artificial bandwidth extension of speech signals Digital Speech Transmission and Enhancement is a single-source, comprehensive guide to the fundamental issues, algorithms, standards, and trends in speech signal processing and speech communication technology, and as such is an invaluable resource for engineers, researchers, academics, and graduate students in the areas of communications, electrical engineering, and information technology.
- Contents:
- Cover
- Title Page
- Copyright
- Contents
- Preface
- Chapter 1 Introduction
- Chapter 2 Models of Speech Production and Hearing
- 2.1 Sound Waves
- 2.2 Organs of Speech Production
- 2.3 Characteristics of Speech Signals
- 2.4 Model of Speech Production
- 2.4.1 Acoustic Tube Model of the Vocal Tract
- 2.4.2 Discrete Time All‐Pole Model of the Vocal Tract
- 2.5 Anatomy of Hearing
- 2.6 Psychoacoustic Properties of the Auditory System
- 2.6.1 Hearing and Loudness
- 2.6.2 Spectral Resolution
- 2.6.3 Masking
- 2.6.4 Spatial Hearing
- 2.6.4.1 Head‐Related Impulse Responses and Transfer Functions
- 2.6.4.2 Law of The First Wavefront
- References
- Chapter 3 Spectral Transformations
- 3.1 Fourier Transform of Continuous Signals
- 3.2 Fourier Transform of Discrete Signals
- 3.3 Linear Shift Invariant Systems
- 3.3.1 Frequency Response of LSI Systems
- 3.4 The z‐transform
- 3.4.1 Relation to Fourier Transform
- 3.4.2 Properties of the ROC
- 3.4.3 Inverse z‐Transform
- 3.4.4 z‐Transform Analysis of LSI Systems
- 3.5 The Discrete Fourier Transform
- 3.5.1 Linear and Cyclic Convolution
- 3.5.2 The DFT of Windowed Sequences
- 3.5.3 Spectral Resolution and Zero Padding
- 3.5.4 The Spectrogram
- 3.5.5 Fast Computation of the DFT: The FFT
- 3.5.6 Radix‐2 Decimation‐in‐Time FFT
- 3.6 Fast Convolution
- 3.6.1 Fast Convolution of Long Sequences
- 3.6.2 Fast Convolution by Overlap‐Add
- 3.6.3 Fast Convolution by Overlap‐Save
- 3.7 Analysis-Modification-Synthesis Systems
- 3.8 Cepstral Analysis
- 3.8.1 Complex Cepstrum
- 3.8.2 Real Cepstrum
- 3.8.3 Applications of the Cepstrum
- 3.8.3.1 Construction of Minimum‐Phase Sequences
- 3.8.3.2 Deconvolution by Cepstral Mean Subtraction
- 3.8.3.3 Computation of the Spectral Distortion Measure
- 3.8.3.4 Fundamental Frequency Estimation
- References.
- Chapter 4 Filter Banks for Spectral Analysis and Synthesis
- 4.1 Spectral Analysis Using Narrowband Filters
- 4.1.1 Short‐Term Spectral Analyzer
- 4.1.2 Prototype Filter Design for the Analysis Filter Bank
- 4.1.3 Short‐Term Spectral Synthesizer
- 4.1.4 Short‐Term Spectral Analysis and Synthesis
- 4.1.5 Prototype Filter Design for the Analysis-Synthesis filter bank
- 4.1.6 Filter Bank Interpretation of the DFT
- 4.2 Polyphase Network Filter Banks
- 4.2.1 PPN Analysis Filter Bank
- 4.2.2 PPN Synthesis Filter Bank
- 4.3 Quadrature Mirror Filter Banks
- 4.3.1 Analysis-Synthesis Filter Bank
- 4.3.2 Compensation of Aliasing and Signal Reconstruction
- 4.3.3 Efficient Implementation
- 4.4 Filter Bank Equalizer
- 4.4.1 The Reference Filter Bank
- 4.4.2 Uniform Frequency Resolution
- 4.4.3 Adaptive Filter Bank Equalizer: Gain Computation
- 4.4.3.1 Conventional Spectral Subtraction
- 4.4.3.2 Filter Bank Equalizer
- 4.4.4 Non‐uniform Frequency Resolution
- 4.4.5 Design Aspects &
- Implementation
- Chapter 5 Stochastic Signals and Estimation
- 5.1 Basic Concepts
- 5.1.1 Random Events and Probability
- 5.1.2 Conditional Probabilities
- 5.1.3 Random Variables
- 5.1.4 Probability Distributions and Probability Density Functions
- 5.1.5 Conditional PDFs
- 5.2 Expectations and Moments
- 5.2.1 Conditional Expectations and Moments
- 5.2.2 Examples
- 5.2.2.1 The Uniform Distribution
- 5.2.2.2 The Gaussian Density
- 5.2.2.3 The Exponential Density
- 5.2.2.4 The Laplace Density
- 5.2.2.5 The Gamma Density
- 5.2.2.6 χ2‐Distribution
- 5.2.3 Transformation of a Random Variable
- 5.2.4 Relative Frequencies and Histograms
- 5.3 Bivariate Statistics
- 5.3.1 Marginal Densities
- 5.3.2 Expectations and Moments
- 5.3.3 Uncorrelatedness and Statistical Independence
- 5.3.4 Examples of Bivariate PDFs.
- 5.3.4.1 The Bivariate Uniform Density
- 5.3.4.2 The Bivariate Gaussian Density
- 5.3.5 Functions of Two Random Variables
- 5.4 Probability and Information
- 5.4.1 Entropy
- 5.4.2 Kullback-Leibler Divergence
- 5.4.3 Cross‐Entropy
- 5.4.4 Mutual Information
- 5.5 Multivariate Statistics
- 5.5.1 Multivariate Gaussian Distribution
- 5.5.2 Gaussian Mixture Models
- 5.6 Stochastic Processes
- 5.6.1 Stationary Processes
- 5.6.2 Auto‐Correlation and Auto‐Covariance Functions
- 5.6.3 Cross‐Correlation and Cross‐Covariance Functions
- 5.6.4 Markov Processes
- 5.6.5 Multivariate Stochastic Processes
- 5.7 Estimation of Statistical Quantities by Time Averages
- 5.7.1 Ergodic Processes
- 5.7.2 Short‐Time Stationary Processes
- 5.8 Power Spectrum and its Estimation
- 5.8.1 White Noise
- 5.8.2 The Periodogram
- 5.8.3 Smoothed Periodograms
- 5.8.3.1 Non Recursive Smoothing in Time
- 5.8.3.2 Recursive Smoothing in Time
- 5.8.3.3 Log‐Mel Filter Bank Features
- 5.8.4 Power Spectra and Linear Shift‐Invariant Systems
- 5.9 Statistical Properties of Speech Signals
- 5.10 Statistical Properties of DFT Coefficients
- 5.10.1 Asymptotic Statistical Properties
- 5.10.2 Signal‐Plus‐Noise Model
- 5.10.3 Statistics of DFT Coefficients for Finite Frame Lengths
- 5.11 Optimal Estimation
- 5.11.1 MMSE Estimation
- 5.11.2 Estimation of Discrete Random Variables
- 5.11.3 Optimal Linear Estimator
- 5.11.4 The Gaussian Case
- 5.11.5 Joint Detection and Estimation
- 5.12 Non‐Linear Estimation with Deep Neural Networks
- 5.12.1 Basic Network Components
- 5.12.1.1 The Perceptron
- 5.12.1.2 Convolutional Neural Network
- 5.12.2 Basic DNN Structures
- 5.12.2.1 Fully‐Connected Feed‐Forward Network
- 5.12.2.2 Autoencoder Networks
- 5.12.2.3 Recurrent Neural Networks
- 5.12.2.4 Time Delay, Wavenet, and Transformer Networks.
- 5.12.2.5 Training of Neural Networks
- 5.12.2.6 Stochastic Gradient Descent (SGD)
- 5.12.2.7 Adaptive Moment Estimation Method (ADAM)
- Chapter 6 Linear Prediction
- 6.1 Vocal Tract Models and Short‐Term Prediction
- 6.1.1 All‐Zero Model
- 6.1.2 All‐Pole Model
- 6.1.3 Pole‐Zero Model
- 6.2 Optimal Prediction Coefficients for Stationary Signals
- 6.2.1 Optimum Prediction
- 6.2.2 Spectral Flatness Measure
- 6.3 Predictor Adaptation
- 6.3.1 Block‐Oriented Adaptation
- 6.3.1.1 Auto‐Correlation Method
- 6.3.1.2 Covariance Method
- 6.3.1.3 Levinson-Durbin Algorithm
- 6.3.2 Sequential Adaptation
- 6.4 Long‐Term Prediction
- Chapter 7 Quantization
- 7.1 Analog Samples and Digital Representation
- 7.2 Uniform Quantization
- 7.3 Non‐uniform Quantization
- 7.4 Optimal Quantization
- 7.5 Adaptive Quantization
- 7.6 Vector Quantization
- 7.6.1 Principle
- 7.6.2 The Complexity Problem
- 7.6.3 Lattice Quantization
- 7.6.4 Design of Optimal Vector Code Books
- 7.6.5 Gain-Shape Vector Quantization
- 7.7 Quantization of the Predictor Coefficients
- 7.7.1 Scalar Quantization of the LPC Coefficients
- 7.7.2 Scalar Quantization of the Reflection Coefficients
- 7.7.3 Scalar Quantization of the LSF Coefficients
- Chapter 8 Speech Coding
- 8.1 Speech‐Coding Categories
- 8.2 Model‐Based Predictive Coding
- 8.3 Linear Predictive Waveform Coding
- 8.3.1 First‐Order DPCM
- 8.3.2 Open‐Loop and Closed‐Loop Prediction
- 8.3.3 Quantization of the Residual Signal
- 8.3.3.1 Quantization with Open‐Loop Prediction
- 8.3.3.2 Quantization with Closed‐Loop Prediction
- 8.3.3.3 Spectral Shaping of the Quantization Error
- 8.3.4 ADPCM with Sequential Adaptation
- 8.4 Parametric Coding
- 8.4.1 Vocoder Structures
- 8.4.2 LPC Vocoder
- 8.5 Hybrid Coding
- 8.5.1 Basic Codec Concepts.
- 8.5.1.1 Scalar Quantization of the Residual Signal
- 8.5.1.2 Vector Quantization of the Residual Signal
- 8.5.2 Residual Signal Coding: RELP
- 8.5.3 Analysis by Synthesis: CELP
- 8.5.3.1 Principle
- 8.5.3.2 Fixed Code Book
- 8.5.3.3 Long‐Term Prediction, Adaptive Code Book
- 8.6 Adaptive Postfiltering
- 8.7 Speech Codec Standards: Selected Examples
- 8.7.1 GSM Full‐Rate Codec
- 8.7.2 EFR Codec
- 8.7.3 Adaptive Multi‐Rate Narrowband Codec (AMR‐NB)
- 8.7.4 ITU‐T/G.722: 7 kHz Audio Coding within 64 kbit/s
- 8.7.5 Adaptive Multi‐Rate Wideband Codec (AMR‐WB)
- 8.7.6 Codec for Enhanced Voice Services (EVS)
- 8.7.7 Opus Codec IETF RFC 6716
- Chapter 9 Concealment of Erroneous or Lost Frames
- 9.1 Concepts for Error Concealment
- 9.1.1 Error Concealment by Hard Decision Decoding
- 9.1.2 Error Concealment by Soft Decision Decoding
- 9.1.3 Parameter Estimation
- 9.1.3.1 MAP Estimation
- 9.1.3.2 MS Estimation
- 9.1.4 The A Posteriori Probabilities
- 9.1.4.1 The A Priori Knowledge
- 9.1.4.2 The Parameter Distortion Probabilities
- 9.1.5 Example: Hard Decision vs. Soft Decision
- 9.2 Examples of Error Concealment Standards
- 9.2.1 Substitution and Muting of Lost Frames
- 9.2.2 AMR Codec: Substitution and Muting of Lost Frames
- 9.2.3 EVS Codec: Concealment of Lost Packets
- 9.3 Further Improvements
- Chapter 10 Bandwidth Extension of Speech Signals
- 10.1 BWE Concepts
- 10.2 BWE using the Model of Speech Production
- 10.2.1 Extension of the Excitation Signal
- 10.2.2 Spectral Envelope Estimation
- 10.2.2.1 Minimum Mean Square Error Estimation
- 10.2.2.2 Conditional Maximum A Posteriori Estimation
- 10.2.2.3 Extensions
- 10.2.2.4 Simplifications
- 10.2.3 Energy Envelope Estimation
- 10.3 Speech Codecs with Integrated BWE
- 10.3.1 BWE in the GSM Full‐Rate Codec.
- 10.3.2 BWE in the AMR Wideband Codec.
- Notes:
- Includes bibliographical references and index.
- Description based on publisher supplied metadata and other sources.
- Description based on print version record.
- ISBN:
- 1-119-06099-0
- 1-119-06097-4
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.