Instructor: Dr. Yoav Goldberg
Email: yogo / cs.biu.ac.il
Office: Room 4 building 216
Office Hours : By appointment
This course aims to explain modern statistical machine translation systems, how they work, what doesn't work, and where we are likely to improve.
Course communication (both announcements from me and questions from you) happen through the Piazza website. Please enroll, and use the Q&A link at the top.
There will be 4 assignment. The assignments are 40% of the grade, and the final exam is 60%. Students who wish to do so may take on a course project as 30% of the grade, making the exam weigh only 30%.
The course is self contained, but references to extra reading materials will be provided.
Assignment 1 -- Evaluation. Deadline: March 20, 2014
Assignment 2 -- Alignment. Deadline: May 4, 2014
Assignment 3 -- Decoding. Deadline: June 8, 2014
Here is a nice exercise, developed by Kevin Knight and put into this form by Adam Lopez. Please try to work on it before next class (9/3/2014). It's fun! (no grade attached)
Introduction, noisy channel, parallel corpora. slides
Reading Classic Intro to Modern MT
Language models, Evaluation. slides
o Reading Smoothing details || Large LMs / Stupid Backoff || BLEU
o Examples of generating sentences from twitter based unigram, bigram and trigram models.
Word-word translations (Alignments, IBM model 1, EM). Slides More slides
o Reading Model 1 and 2 introduced by Mike Collins || IBM models introduced by Kevin Knight
More Alignments (Models 2,3, HMM-alignment, Alignment Eval, Available Software). Slides
o Reading Simple but Effective Improvements to Model 1 || Description and Comparison of Various Models, Evaluation, Symmetrization || Improved HMM Alignment
o Software Giza++ (Models 1-5, HMM) || Berkeley Aligner (HMM+) || Nile (Supervised)
Phrase-based translation 1 (using alignments, phrase table extraction). Slides More slides
Phrase-based translation 2 (decoding). Slides More sldes
o Reading A formal description of phrase-based stack decoding || Phrase-based translation paper
o Software Moses (phrase-based decoder)
Feature-based models and the PRO algorithm Slides
o Reading The PRO training paper
Feaure-based models: Reranking Slides
Reoredering Slides
Syntax Based Translation 1 (Hiero) Slides More Slides
Syntax Based Translation 2 (GHKM Rules) Slides More Slides
Syntax Based Translation 3 (GHKM Decoding -- Tree to String and String to Tree) Slides More Slides