Speech Processing and Recognition (89608)


Lecturer: Dr. Yossi Keshet

Teaching assistant: Shua Dissen


Exam Moed A will be on 25.7.17

Exam Moed B will be on 12.9.17

Course books:

Rabiner and Schafer: Theory and Applications of Digital Speech Processing, Prentice Hall, 2010.

Huang, Acero, and Hon: Spoken Language Processing, Prentice Hall, 2001.

Rabiner and Juang: Fundamentals of Speech Recognition, Prentice Hall, 1993.

Deller, Hansen, and Proakis: Discrete-time Processing of Speech Signals, 2000.

Quatieri: Discrete-time Speech Signal Processing, Prentice Hall, 2001.

Class notes:

Some of the lecture notes are based on the lecture notes of the speech recognition course given in Columbia University (e6870).

  1. Lecture 1 - Introduction and signal processing. The matlab code explaining what_is_fft.m (a bonus will be given to anyone how traslate the code into Python)

  2. Lecture 2 - Signal processing and features.

  3. Lecture 3 - Dynamic Time Warping (DTW).

  4. Lecture 4 - Gaussian Mixture Models (GMM). Further reading on the EM algorithm and GMMs can be found in Jeff A. Bilmes's tutorial.

  5. Lecture 5 - Hidden Markov Models (HMM). They are explained beautifully in the seminal tutorial of Lawrance R. Rabiner

  6. Lecture 6 - Language modelling. See also the book chapter on N-grams of Jurafsky and Martin

  7. Lecture 7 (based on presentation of Lim Zhi Hao, 2015) - Weighted Finite State Transducers (WFSTs). See also Mohri, Pereira, and Riley's survey


  1. Assignment 1 (corrected version) and it's WAV and transcription (TextGrid) files, additionally here you can find many spoken digits examples -- Due: May 3, 2017 Grades

  2. Assignment 2 and it's files and more files can be downloded from here (new link 22.5.2017). -- Due: May 21, 2017 May 28, 2017 June 4, 2017

  3. Assignment 3 -- Due: June 30, 2017