Introduction, Physiological and Mathematical Models, Categorization of Speech Sounds; Discrete time speech signals, Fourier transform and Z-transform, convolution, filter banks. Spectral estimation, Pole-zero modeling and linear prediction (LP) analysis. Homomorphic deconvolution, cepstral analysis; Feature extraction, Static and dynamic features, robustness, feature selection. Mel frequency cepstral coefficients (MFCC), linear prediction cepstral coefficients (LPCC), Perceptual LPCC; Distance measures: Log spectral distance, cepstral distances, weighted cepstral distances, distances for linear and warped scales, Dynamic Time Warping for Isolated Word Recognition; Statistical models for speech recognition: Vector quantization model, Gaussian mixture model, Discrete and Continuous Hidden Markov modeling. |
Texts/References Books:
- Thomas F. Quatieri, “Discrete-Time Speech Signal Processing: Principles and Practice,” Prentice-Hall, 2001.
- L. Rabiner and B. Juang, “Fundamentals of Speech Recognition,” Prentice-Hall, 1993.
- B. Gold and B. Morgan, “Speech and Audio Signal Processing: Processing and Perception of Speech and Music,” Wiley 2000.
|