This dataset is the result of applying Lin and Pantel’s learning algorithm (Lin and Pantel, 2001) over the Reuters corpus as implemented by BIU NLP lab.
The dataset contains two tables, one for describing the binary templates and one for describing the rules between templates.
- Download the dataset (compressed sql dump file)
- The format of the template table is [template_id | template_description], where each binary template is assigned a unique ID. The description of templates is following the string representation of Lin and Pantel.
- The format of the rule table is [template_id1 | template_id2 | score] and provides the similarity score for pairs of templates. Note that this similarity measure is symmetric so there is no directionality.
- References: Dekang Lin and Patrick Pantel. Dirt – Discovery of Inference Rules from Text. Proceedings of ACM Conference on Knowledge Discovery and Data Mining, 2001.
- Contact: Jonathan Berant, jonatha6 @ post.tau.ac.il