The data directory contains files derived from the Canadian Hansards, originally aligned by Ulrich Germann: -input: French sentences to translate. -tm: a phrase-based translation model. French phrase ||| English phrase ||| log_10(translation_prob) -lm: a trigram language model file in ARPA format: log_10(ngram_prob) ngram log_10(backoff_prob) The backoff prob should be used when using this ngram as a backoff. For example, if you query for "a b c" and this ngram is found, use the ngram_prob. However, if it is not found, look for "b c" and return the backoff_prob associated with "b c". If "b c" is also not available, look for "c" and return the backoff_prob associated with "c". If "c" is also not found, just return the prob of the "" token. The lm assumes sentences start with "" and end with "". The language model and translation model are computed from the data in the align directory, using alignments from the Berkeley aligner.