Instructor: Dr. Yoav Goldberg
Email: yogo / cs.biu.ac.il
Office: 216 building 216
Office Hours: TBA
This course aims to explain modern statistical machine translation systems, how they work, what doesn't work, and where we are likely to improve.
There will be 3 assignment, one of them will have two parts. The assignments are 40% of the grade, and the final exam is 60%. Students who wish to do so may take on a course project as 30% of the grade, making the exam weigh only 30%.
The course is self contained, but references to extra reading materials will be provided.
Here are some templates for questions which you can expect to find in the exam.
Good luck!
Assignment 1 - evaluation
Deadline: March 20 (but see inside)
Assignment 2 - alignment
Deadline: April 11 (because the website was down over parts
of the weekend, new deadline is Sunday, April 14).
Assignment 3 - decoding
Deadline: Last week of the semester.
Introduction, noisy channel, parallel corpora. slides
Reading Classic Intro to Modern MT
Language models, Evaluation. slides
o Reading Smoothing details || Large LMs / Stupid Backoff || BLEU
o Examples of generating sentences from twitter based unigram, bigram and trigram models.
Word-word translations (Alignments, IBM model 1, EM). Slides
o Reading Model 1 and 2 introduced by Mike Collins || IBM models introduced by Kevin Knight
More Alignments (Models 2,3, HMM-alignment, Alignment Eval, Available Software). Slides
o Reading Simple but Effective Improvements to Model 1 || Description and Comparison of Various Models, Evaluation, Symmetrization || Improved HMM Alignment
o Software Giza++ (Models 1-5, HMM) || Berkeley Aligner (HMM+) || Nile (Supervised)
Phrase-based translation 1 (using alignments, phrase table extraction). Slides
Phrase-based translation 2 (decoding). Slides
o Reading A formal description of phrase-based stack decoding || Phrase-based translation paper
o Software Moses (phrase-based decoder)
Feature-based models and the PRO algorithm Slides
Feaure-based models: Reranking Slides
Reoredering Slides
Syntax Based Translation 1 (Hiero) Slides
Syntax Based Translation 2 (GHKM Rules) Slides
Syntax Based Translation 3 (GHKM Decoding -- Tree to String and String to Tree) Slides