RTE-2 Preprocessed Datasets

We preprocessed the text and hypothesis of each pair in the development and test sets. The preprocessing includes sentence splitting using MXTERMINATOR (Reynar and Ratnaparkhi, 1997) and dependency parsing using MINIPAR (Lin, 1998).

Using the pre-processed data is optional, and it is allowed, of course, to use alternative tools for preprocessing. Note that since the preprocessing is done automatically it does contain some errors. We provide this data as-is, and give no warranty on the quality of the pre-processed data.
The preprocessed format for both text and hypothesis is the same:

Download:
preprocessed development set
preprocessed test set


References:

Jeffrey C. Reynar and Adwait Ratnaparkhi. A Maximum Entropy Approach to Identifying Sentence Boundaries. In Proceedings of the Fifth Conference on Applied Natural Language Processing,March 31-April 3, 1997. Washington, D.C.

Dekang Lin. 1998. Dependency-based evaluation of MINIPAR. In Proceedings of the Workshop on Evaluation of Parsing Systems at LREC 1998, Granada, Spain.