The data directory contains files derived from the Canadian Hansards,
originally aligned by Ulrich Germann:

-input: French sentences to translate.

-tm: a phrase-based translation model.
  French phrase ||| English phrase ||| log_10(translation_prob)

-lm: a trigram language model file in ARPA format:
  log_10(ngram_prob)   ngram   log_10(backoff_prob)

  The backoff prob should be used when using this ngram as a backoff.
  For example, if you query for "a b c" and this ngram is found, use the ngram_prob.
  However, if it is not found, look for "b c" and return the backoff_prob associated with "b c".
  If "b c" is also not available, look for "c" and return the backoff_prob associated with "c".
  If "c" is also not found, just return the prob of the "<unk>" token.

  The lm assumes sentences start with "<s>" and end with "</s>".

The language model and translation model are computed from the data 
in the align directory, using alignments from the Berkeley aligner.