Tuesday, Nov 18


Registration and Coffee


Welcome and Opening remarks [video]
    Rabbi Prof. Daniel Hershkowitz, President of Bar-Ilan University
    Prof. Ido Dagan, Department of Computer Science, Bar-Ilan University


Closed Set Extraction [20m, abstractTraditional named entity extraction techniques get a text document as input and output a list of entities that appear in the input text. Specifically, the algorithm is expected to identify the instances of well known entities as well as discover new emerging entities. In this talk we will focus on the following question: How much better can one do if we only need to identify instances of a closed set of individual entities?, slides, video]
    Enav Weinreb, Algorithms Group Manager, Text Metadata Services, Thomson Reuters

Applied Tutorial: A practical introduction to distributional and neural methods for semantics [70m, abstractDistributional semantic models exploit the intuition that similar words occur in similar contexts to automatically induce vector-based word meaning representations on a large scale from text corpora. Distributional semantic models have been shown to correlate with human intuitions about word similarity, and they are of practical use in applications ranging from recognizing textual entailment to machine translation. In this tutorial, we will introduce distributional semantic models, covering both the traditional approach in which words are directly represented by context-recording vectors, and newer methods relying on neural-network-based representation learning techniques. The tutorial will focus on the practical aspects of the models, and we will present concrete examples of their performance in an interactive setting., slides, video]
    Marco Baroni, The Center for Mind/Brain Sciences of the University of Trento
    Yoav Goldberg, Department of Computer Science, Bar-Ilan University


Coffee break


Overview Talk: Visualizing and Navigating Large-Scale Document Networks: Patent Collection Case Study [30m, abstractMulti-million document collections have become common assets in many domains, such as medical, legal, scientific, commercial etc. While processing those collections is nowadays a routine task, not all problems have been solved. Some of the open issues are actually fairly basic, for example, how to figure out what the collection is all about, that is, how to create a holistic view on the collection. Typical solutions are search (that provides a narrow visibility at the collection from the point of view of a query), and classification / clustering of the entire collection (which is often too general, and besides is technologically hard).
We view a document collection as a graph where nodes are documents and edges are semantic relationships between the documents. This allows exploration of any topical region as well as zooming in or out to analyze the density / topology of various regions. This approach does not solve the fundamental problem as the holistic view on the entire collection is still unachievable, however it provides visualization and exploration capabilities that cannot be obtained elsewhere. On an example of a collection of all patents issued in the US over the last 39 years, we demonstrate the advantages of our approach, and overview a variety of visualization schema before converging to the one that appears most useful.
, video]
    Ron Bekkerman, Faculty of Management, Haifa University, Israel

Reading Between the Lines: Mining Domain-Specific Informal Texts with Mixed Registers and Languages [20m, slides, video]
    Shmuel Bar, Founder and CEO IntuView

IBM Debating Technologies, IBM Haifa Research Lab, Cognitive Analytics Department [40m]
    Introduction [video]
        Noam Slonim
    Show Me Your Evidence - an Automatic Method for Context Dependent Evidence Detection [video]
        Ruty Rinott
    Pro or Con? Identifying the Polarity of Claims in a given Context [video]
        Roy Bar-Haim
    Summary and Demo [video]
        Noam Slonim



14:10-15:50 The EXCITEMENT Project

Overview Talk: Textual inference: Methods, open source platform and applications [40m, abstractMost semantic text processing applications, such as semantic search, question answering, information extraction and opinion mining, have a common underlying need for semantic matching of different texts. This commonly required inference was formulated under the Textual Entailment paradigm, which has been attracting substantial attention within the natural language processing research community in recent years. The Recognizing Textual Entailment (RTE) task requires to decide, given two text fragments, whether or not one of them implies the meaning of the other, while identical meanings correspond to mutual (bi-directional) entailment. In this talk we will first review this task, its applications and algorithms for addressing it. Next, we will describe the Excitement Open Platform (EOP), developed within the EC-funded EXCITEMENT project, which provides a flexible open-source toolbox with state-of-the-art semantic technology for recognizing textual entailment. Lastly, we will describe the notion of entailment graphs, which provide the basis for effective text exploration. The EOP, and its further use for constructing entailment graphs, provide the basis for the industrial applications within the EXCITEMENT project, as described in the subsequent industrial talks of this session., slides, video]
    Ido Dagan, Department of Computer Science, Bar-Ilan University
    Bernardo Magnini, Foundation Bruno Kessler, Trento
    Guenter Neumann, German Research Center for Artificial Intelligence, Saabrucken
    (Joint work with Sebastian Pado, University of Heidelberg)

What customers are complaining about? Understanding the reasons for customer dissatisfaction [20m, slides, video]
    Gennadi Lembersky, Text Analytics Researcher, NICE Systems, Israel

Using ihe EXCITEMENT textual inference platform to Extend Almawave’s Products Capabilities [20m, slides, video]
    Adriana Farina, Researcher and Analyst Programmer, Iride Lab, Almawave, Rome

Improving customer service quality for the OMQ service tools using the EXCITEMENT Textual Inference Platform [20m, abstractOMQ is a Berlin based company. They develop software solutions that increase the efficiency of the support process of companies with high customer request volume. This is done by providing answers to customer requests automatically. The technology behind the products is a keyword based, fault tolerant and self learning search. The results of the Excitement project are to be used to improve the base of the technology. By using the new Excitement component the OMQ products should find more relevant results and discard the irrelevant ones., slides, video]
    Matthias Meisdrock, CEO, OMQ, Berlin


Coffee break


Lightly Supervised Content Modeling [20m, abstractService organizations save a lot of data describing customer interaction.
These data usually contain some structured information such as problem codes but a lot of the information is in free text.
I will describe a system leveraging Topic Modeling with a strong pre-processing modules of synonym detection and redundancy reduction with light supervision in order to add domain specific content tags.
, slides, video]
    Raphael Cohen, Principal Data Scientist, Beer Sheva COE, Data Services, Corp IT, EMC

Extending Taykey's Ontology: Term Discovery and Categorization [20m, abstractIn Taykey we identify in real time topics and trends that are currently hot in the social network chatter. A discovered trend is characterized by a particular term, or a group of terms, that has an occurrence frequency peak within some time window.
We will focus our presentation on the term-system used in Taykey. We will introduce the algorithms behind new term discovery and term categorization (e.g., mapping the terms onto the standard 'verticals' of advertisement industry).
, slides, video]
    Tzvika Marx, TayKey

Keynote Talk: Contextualized Text Processing for Information Access Applications: Incorporating Behavioral, Temporal, and Social Data [45m, abstract Text online is not created in the vacuum: much of it is contributed by persons or communities. This talk will overview our efforts on analyzing the contextual signals associated with the creation and interaction with text, and how to use them to improve text processing applications. Particularly for user-generated content, metadata such as authorship, and social feedback can be incorporated into data mining models to help filter, categorize and analyze text. Once created, text online continues to evolve - providing additional signals about the importance of words, concepts, and topics that can be helpful for text analysis and retrieval. Finally, the applications that allow users to access text often track user behavior data -- providing signals on how people and apps interact with the content. These behavior signals contain information helpful to tasks ranging from ranking and recommendation to summarization and system evaluation. This line of work requires people contributing and interacting with text. If time permits, I will discuss some of the challenges and instrumentation and crowdsourcing methodologies developed to enable this research.]
    Eugene Agichtein, Emory University and Yahoo research

Wednesday, Nov 19


Registration and Coffee


Position Talk: Going Beyond the Document-query Lexical Match [40m, abstractThe ad hoc retrieval task is to rank documents in a corpus by their relevance to the information need expressed by a query. The most common ranking approach is based on devising representations for the document and the query and comparing them using some similarity function. We will survey a few of the most effective query and document representation approaches which rely on different text analysis techniques. We will then argue for the importance of further pursuing this line of research using a few examples of long-standing open problems., slides, video]
    Oren Kurland, Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology

The Adrenaline factor: Finding Drama in Ordinary Life [20m, slides, video]
    Eli katz, Director of Algorithms , Viaccess Orca

Yahoo Labs Israel
    Entity Intent Representation by Community-based Questions [20m, abstractThis talk deals with the question of how to identify user search intents that are specifically related to a given entity. We focus on community question answering sites where any object containing a question about an entity, answered by several community members, represents a coherent and meaningful entity search intent (ESI) that can be easily identified. By extracting entity related objects from the site archive, and clustering them into comprehensible groups, we consider each group of similar objects as a specific entity's related ESI. We show that this set of ESIs can successfully represent the entity's related search intents of the community, i.e., what people used to ask about it.
Next we show that this representation can be further used for estimating semantic relatedness between entities, measured based on the similarity between their ESI-based representations. We evaluate the relatedness measurement between two entities by showing high correlation between the intent-based similarity and the average relatedness score given by human annotators. We will finally discuss several other entity search applications that could also greatly benefit from ESI-based representation.
        David Carmel, Principal Research Scientist
    Novelty-targeted Ranking of Community-based Answers [20m, abstractQuestions and their correspoding answers within a Community Question Answering (CQA) site are frequently presented as top search results for Web search queries and viewed by millions of searchers daily. The number of answers for CQA questions ranges from a handful to dozens, and for questions that ask for recommendations or opinions, a reader would be interested in different views and suggestions. Yet, especially when many answers are provided, the viewer may not want to sift through all answers but to read only the top ones. Prior work on answer ranking in CQA considered the qualitative notion of each answer separately, mainly whether it should serve as best answer.
We propose to promote CQA answers not only by their relevance to the question but also by diversification and novelty qualities they hold compared to other answers. Specifically, we aim at ranking answers by the amount of new aspects they introduce with respect to higher ranking answers (novelty), on top of their relevance estimation. This approach is common in Web Search and document retrieval, yet it was not addressed within the CQA settings before, which is quite different than class IR. We draw similar lines between the proposed task and query-focused multi-document summarization and propose a novel answer ranking algorithm that borrows ideas from the Summarization research field, but adapt them to our scenario. Specifically, our method looks at syntactic propositions as the atomic text units. It measures the similarity between propositions and generate a hierarchical clustering of them across all answers. Finally, answers are ranked in a greedy manner, taking into account their relevance to the question as well as their dissimilarity to higher ranking answers.
A gold-standard manual experiment over a collection of Health questions and a comparative user study show that considering novelty for answer ranking improves the quality of the ranked answer list.
        Idan Szpektor, Senior Research Scientist


Coffee break


Extracting Insights from Online Patients Conversations Using Natural Language Processing [20m, abstractPatients are increasingly taking an active role in their healthcare. The Internet has not only given consumers and patients a place to get information and conduct research. It’s given them a place to tell their story.
Treato automatically collects, indexes and analyzes the massive amount of content patients and caregivers generate online to extract relevant information, connect the dots and create the big picture of what they are saying about their personal treatment- and condition-related experiences. The result is the world’s largest source of patient insights gathered from billions of online conversations across the social web. We call it the patient voice.
, video]
    Roee Sa'adon, VP Technology, Treato

Semantic Processing for Social Media Advertising Optimization [20m, abstractComprendi turns textual big data into actionable marketing insights. Our proprietary analytics platform extracts user interests and sentiment towards products by semantically analyzing extremely high volumes of textual data from a variety of media. This enables our customers to better understand their target audience and to tailor the right offer to the right user at the right time.
In our presentation we will discuss how our Text2Insight platform allows TV networks, broadcasters and advertisers to gain deep insights from TV related chatter on Social Media.
    Kfir Bar, Comprendi Inc.

Optimizing Language Models for Speech Systems in the Automotive Environment [20m, abstractCloud-based speech recognizers have become a popular service used in many applications and in mobile phones. In vehicles, however, they are not performing to their full extend because they usually do not adapt to the specific in-car use cases or the drivers and driving situations. On the other hand, traditional in-vehicle embedded speech systems are limited in content and processing power, among others. One of the speech system components, which suffer from these limitations, is the recognizer’s language model. It is either kept generic and non-adaptable in cloud-based recognizers, or is adapted to users and their in-car use cases on local recognizers with limited local content. We have combined a local recognizer capable of language model adaptation with a cloud-based service and used a simple hypothesis classification to combine the recognition results. Results for typical use cases, such as phone call requests, are encouraging and prove that this hybrid approach outperforms any of the individual speech recognizers. , slides, video]
    Ute Winter, User Experience Technologies, General Motors Advanced Technical Center, Israel
    (Joint work with Ming Sun and Alexander I. Rudnicky, Department for Computer Science, Carnegie Mellon University)

Conversational Speech Understanding [20m, video]
    Moshe Wasserblat, Senior NLU architect, Intel Israel


Lunch, Reception and Desk Session (posters and demonstrations)


Semantic Challenges in Content Discovery Services [25m, abstractModern content discovery and recommendation services serve billions of recommendations per day across thousands of sites, reaching hundreds of millions of unique users per month. While many aspects of such services are addressed by recent developments in recommender system technology, some challenges are under-explored in the literature. This talk will present two such challenges, which stem from gaps in semantic understanding of content. The first challenge involves distinguishing ephemeral content, whose relevance decays rapidly and even abruptly, from 'evergreen' content, which may remain a valid recommendation for years. The second challenge involves identifying cases where a certain content item, while suitable as a recommendation in most contexts, is deemed as being 'insensitive' when served as a recommended in the context of specific stories., slides, video]
    Ronny Lempel, VP Recommendations Group, Outbrain

Search from Google's R&D Center in Israel [55m, abstractThis talk will overview some of the Search projects done at Google's R&D center in Israel, including Google Suggest, Live Results, and Google Trends.]
    Ziv Bar-Yossef, Google Israel