Machine Translation

Spring 2013

Instructor: Dr. Yoav Goldberg
Email: yogo / cs.biu.ac.il
Office: 216 building 216
Office Hours: TBA

This course aims to explain modern statistical machine translation systems, how they work, what doesn't work, and where we are likely to improve.

Grading

There will be 3 assignment, one of them will have two parts. The assignments are 40% of the grade, and the final exam is 60%. Students who wish to do so may take on a course project as 30% of the grade, making the exam weigh only 30%.

Reading Material

The course is self contained, but references to extra reading materials will be provided.

Exam:

Here are some templates for questions which you can expect to find in the exam.

Good luck!

Assigments

Lecture Materials (to be posted after each class)

  1. Introduction, noisy channel, parallel corpora. slides
    Reading Classic Intro to Modern MT

  2. Language models, Evaluation. slides
    o Reading Smoothing details || Large LMs / Stupid Backoff || BLEU
    o Examples of generating sentences from twitter based unigram, bigram and trigram models.

  3. Word-word translations (Alignments, IBM model 1, EM). Slides
    o Reading Model 1 and 2 introduced by Mike Collins || IBM models introduced by Kevin Knight

  4. More Alignments (Models 2,3, HMM-alignment, Alignment Eval, Available Software). Slides
    o Reading Simple but Effective Improvements to Model 1 || Description and Comparison of Various Models, Evaluation, Symmetrization || Improved HMM Alignment
    o Software Giza++ (Models 1-5, HMM) || Berkeley Aligner (HMM+) || Nile (Supervised)

  5. Phrase-based translation 1 (using alignments, phrase table extraction). Slides

  6. Phrase-based translation 2 (decoding). Slides
    o Reading A formal description of phrase-based stack decoding || Phrase-based translation paper
    o Software Moses (phrase-based decoder)

  7. Feature-based models and the PRO algorithm Slides

  8. Feaure-based models: Reranking Slides

  9. Reoredering Slides

  10. Syntax Based Translation 1 (Hiero) Slides

  11. Syntax Based Translation 2 (GHKM Rules) Slides

  12. Syntax Based Translation 3 (GHKM Decoding -- Tree to String and String to Tree) Slides