We preprocessed the text and hypothesis of each pair in the development and test sets. The preprocessing includes sentence splitting using MXTERMINATOR (Reynar and Ratnaparkhi, 1997) and dependency parsing using MINIPAR (Lin, 1998).
Using the pre-processed data is optional, and it is
allowed, of course,
to use alternative tools for preprocessing. Note that since the
preprocessing is done automatically it does contain some errors. We
provide this data as-is, and give no warranty on the quality of the
pre-processed data.
The preprocessed format for both text and hypothesis is the same:
Download:
preprocessed
development set
preprocessed
test set
Jeffrey C. Reynar and Adwait Ratnaparkhi. A Maximum Entropy Approach to Identifying Sentence Boundaries. In Proceedings of the Fifth Conference on Applied Natural Language Processing,March 31-April 3, 1997. Washington, D.C.
Dekang Lin. 1998. Dependency-based evaluation of MINIPAR. In Proceedings of the Workshop on Evaluation of Parsing Systems at LREC 1998, Granada, Spain.