Lecturer: Dr. Yossi Keshet
Teaching assistant: Shua Dissen
Exam Moed A will be on 25.7.17
Exam Moed B will be on 12.9.17
Rabiner and Schafer: Theory and Applications of Digital Speech Processing, Prentice Hall, 2010.
Huang, Acero, and Hon: Spoken Language Processing, Prentice Hall, 2001.
Rabiner and Juang: Fundamentals of Speech Recognition, Prentice Hall, 1993.
Deller, Hansen, and Proakis: Discrete-time Processing of Speech Signals, 2000.
Quatieri: Discrete-time Speech Signal Processing, Prentice Hall, 2001.
Some of the lecture notes are based on the lecture notes of the speech recognition course given in Columbia University (e6870).
Lecture 1 - Introduction and signal processing. The matlab code explaining what_is_fft.m (a bonus will be given to anyone how traslate the code into Python)
Lecture 2 - Signal processing and features.
Lecture 3 - Dynamic Time Warping (DTW).
Lecture 4 - Gaussian Mixture Models (GMM). Further reading on the EM algorithm and GMMs can be found in Jeff A. Bilmes's tutorial.
Lecture 5 - Hidden Markov Models (HMM). They are explained beautifully in the seminal tutorial of Lawrance R. Rabiner
Lecture 6 - Language modelling. See also the book chapter on N-grams of Jurafsky and Martin
Lecture 7 (based on presentation of Lim Zhi Hao, 2015) - Weighted Finite State Transducers (WFSTs). See also Mohri, Pereira, and Riley's survey
A discussion group for this course is available on Piazza. Before the first use, you need to register using this link with the code "89608".
Assignment 1 (corrected version) and it's WAV and transcription (TextGrid) files, additionally here you can find many spoken digits examples -- Due: May 3, 2017 Grades
Assignment 2 and it's files and more files can be downloded from here (new link 22.5.2017). -- Due:
May 21, 2017 May 28, 2017 June 4, 2017
Assignment 3 -- Due: June 30, 2017