The data directory contains files derived from the Canadian Hansards,
originally aligned by Ulrich Germann:

-input: French sentences to translate.

-tm: a phrase-based translation model.
  French phrase ||| English phrase ||| log_10(translation_prob)

-lm: a trigram language model file in ARPA format:
  log_10(ngram_prob)   ngram   log_10(backoff_prob)

  The backoff (bo) prob should be used when using this ngram as a backoff.

  The probability of a trigram (w1,w2,w3) is given according to the
  following receipe:

     trigram probability of p(w3 | w1, w2) :
         p(w3|w1,w2)= if(trigram exists)           p_3(w1,w2,w3)
                      else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(w3|w2)
                      else                         p(w3|w2)

     bigram probability of p(w2 | w1) :
         p(w2|w1)= if(bigram exists)             p_2(w1,w2)
                   else                          bo_wt_1(w1)*p_1(w2)


  To get the probability of an unknown single work, use the "<unk>" token.

  Keep in mind that the values in the file are in log10, and the
  receipe is not in log-space.

  The lm assumes sentences start with "<s>" and end with "</s>".

The language model and translation model are computed from the data 
in the align directory, using alignments from the Berkeley aligner.