of Computer Science
Bar Ilan University
Ramat Gan, 52900, Israel
E-mail: lili.dav @ gmail.com
My research field is Natural Language Processing. I focus on lexical entailment (my M.Sc research) and entailment-based text exploration (my Ph.D research).
Talks and presentations (peer-reviewed)
· Clustering Moderate-size Collections of Short Texts. The Israeli Seminar on Computational Linguistics (ISCOL), Haifa University. 2014.
· Textual Entailment Graphs. The Israeli Seminar on Computational Linguistics (ISCOL), Haifa University. 2014.
· Sentence Clustering via Projection over Term Clusters. The Israeli Seminar on Computational Linguistics (ISCOL), Ben-Gurion University. 2013.
· ParaQuery: Making Sense of Paraphrase Collections. Bar-Ilan Symposium on the Foundations of Artificial Intelligence (BISFAI), Bar Ilan University. 2013.
· Sentence Clustering via Projection over Term Clusters. Bar-Ilan Symposium on the Foundations of Artificial Intelligence (BISFAI), Bar Ilan University. 2013.
· Deriving Target-Domain Taxonomies from Wikipedia Category Hierarchy. The Israeli Seminar on Computational Linguistics (ISCOL), Bar Ilan University. 2011.
· Directional Semantic Similarity. IBM Machine Learning Seminar, Haifa University. 2009.
· Corpus-Based Distributional Learning of Lexical Entailment. The Israeli Seminar on Computational Linguistics (ISCOL), Bar Ilan University. 2008.
- “Statistical Methods in Computer Science Research”, Bar Ilan University (since 2011).
- “Introduction to Natural Language Processing” and “Information Retrieval”, Bar Ilan University (since 2006).
- “Introduction to Cryptography and Network Security”, The College of Management (2005-2006).
Awards and scholarships
· DIRECT: Directional Distributional Term-Similarity Resource.
The resource contains
directional distributional term-similarity rules automatically extracted as
described in (Kotlerman et.al., JNLE-DLS
2010). Most of the rules are lexical entailment rules, where the meaning of the
rule's left-hand-side implies the meaning of its right-hand-side.
For instance: koala-->animal, bread-->food, imprisonment-->arrest, wedding-->marriage.
· Twitter dataset for sentence clustering (banking domain).
The dataset of tweets used in Kotlerman et al. (2012), and the output of the 4 compared sentence clustering methods for this dataset. The data includes:
1. Gold-standard dataset of 194 sentences crawled from Twitter, expressing reasons for customer dissatisfaction with Citibank. The sentences were gathered automatically by a rule-based extraction algorithm and manually grouped to clusters according to the reasons stated in them.
2. A corpus of 31,898 tweets from the banking domain.
3. Output produced by the novel method suggested in the paper and by the three baseline methods.
· IMDB dataset for text categorization.
Gold-standard dataset presented in (Liebeskind et.al., LRE 2015). The taxonomy and the annotation guidelines are published in the paper. The dataset contains 1970 movie titles and the corresponding categories assigned by the annotator. There are legal issues due to which we cannot publish the texts of the descriptions - they can be found at the IMDB website.