Speech Processing and Recognition (89608)


Lecturer: Dr. Yossi Keshet

Teaching assistant: Shua Dissen

Course books:

Rabiner and Schafer: Theory and Applications of Digital Speech Processing, Prentice Hall, 2010.

Huang, Acero, and Hon: Spoken Language Processing, Prentice Hall, 2001.

Rabiner and Juang: Fundamentals of Speech Recognition, Prentice Hall, 1993.

Deller, Hansen, and Proakis: Discrete-time Processing of Speech Signals, 2000.

Quatieri: Discrete-time Speech Signal Processing, Prentice Hall, 2001.

Class notes:

  1. Lecture 1 - Introduction and signal processing. The matlab code explaining what_is_fft.m (a bonus will be given to anyone how traslate the code into Python)

  2. Lecture 2 - Signal processing and features.

  3. Lecture 3 - Dynamic Time Warping (DTW).

Some of the lecture notes are based on the lecture notes of the speech recognition course given in Columbia University (e6870).


  1. Assignment 1 (corrected version) and it's WAV and transcription (TextGrid) files, additionally here you can find many spoken digits examples. -- Due: May 3, 2017