Speech Processing and Recognition (89608)


Staff:

Lecturer: Yossi Keshet <jkeshet@cs.biu.ac.il>

Teaching assistant: Talia Ben-Simon <advmlcourses+speech@gmail.com>

Zoom link to the live meeting is https://us02web.zoom.us/j/84877155991?pwd=Z3I3NTd4UmRPZERvei9CdGVqb0FEQT09

The demos in the class are based on the Praat software tool.

Lecture notes

  1. Introduction
    Recording of the lecture held on 8/3/2021.

  2. Signal processing: analog signals, sampling, and digital signals
    The lecture is based on the following resources:
    - Jonathan Stein, Digital Signal Processing: A Computer Science Perspective, 1st Edition, 2001
    - Richard Lyons, Understanding Digital Signal Processing, 3rd Edition, 2010
    Recording of the lecture held on 15/3/2021.

  3. Signal processing: Fourier trasform, DFT, FFT, and features
    The lecture is based on the following resources:
    - Jonathan Stein, Digital Signal Processing: A Computer Science Perspective, 1st Edition, 2001
    Recording of the lecture held on 6/4/2021.

  4. Dynamic Time Wrapping (DTW)
    You should try yourself the python code and the python notebook explaining the FFT.
    Recording of the lecture held on 12/4/2021.

  5. Introduction to Automatic Speech Recognition (ASR)
    A great tutorial on Markov models (chains).
    Recording of the lecture held on 26/4/2021.

  6. Introduction to Hidden Markov Models (HMMs)
    The presentation is based on Jurafsky and Martin, "Speech and Language Processing," 3rd edition, Dec 2020 - Appendix A.
    A visual explanation by Victor Powell on Markov models (chains).
    Recording of the lecture held on 3/5/2021.

  7. Connectionist Temporal Classification (CTC)
    A tutorial on Sequence Modeling with CTC by Awni Hannun.
    Recording of the lecture held on 24/5/2021.

  8. Connectionist Temporal Classification (CTC) continued; Modern speech recognition: DeepSpeech-2, wav2letter
    The DeepSpeech-2 paper.
    The wav2letter paper.
    Recording of the lecture held on 31/5/2021.

  9. Sequence-to-sequence models, with attention; Listen, Attend, and Spell (LAS)
    The Jay Alammar's blog: Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
    The Listen, Attend, and Spell paper.
    Recording of the lecture held on 14/6/2021.


Assignments:

  1. Assignment 1: Dynamic Time Wrapping (DTW) - due May 6, 2021 at 22:00 via the SUBMIT system. Please do not use the function dtw() of the librosa library.
  2. Assignment 2: Connectionist Temporal Classification (CTC) - due June 13, 2021 at 22:00 via the SUBMIT system.