Topics in Text Categorization
******************************
Assignment: Using this blog corpus, do
the following. Choose 100 blogs, each with at least 1000 words. Label the first
500 words of blog i, begin_i, and label the last 500 words of blog i, end_i.
Your task is to learn to distinguish pairs <begin_i, end_i> from pairs
<begin_i, end_j>, where i != j. You can use any method you like and you
can use other blogs in the corpus as background information. Generate similar
pairs using different blogs (don’t use any you’ve used in development in any
fashion) and test the effectiveness of your method.
******************************
Lecture 1. General introduction; comparison of learning algorithms
Slides : Lecture1
Readings:
Machine
learning in automated text categorization
F Sebastiani
Inductive Learning
Algorithms and Representations for Text Categorization
S Dumais, J Platt, M
Sahami, D Heckerman
A
re-examination of text categorization methods
Y
Yang, X Liu
Text
Categorization with Support Vector Machines
T
Joachims
Lecture 2. Naïve Bayes; feature selection
Slides : Lecture2
Readings:
A
comparison of event models for Naïve Bayes text classification
A McCallum, K Nigam
A
comparative study on feature selection in text categorization
Y
Yang, JO Pedersen
An
extensive empirical study of feature selection metrics for text classification
G
Forman
Lecture 3: Authorship Attribution
Slides : Lecture3
Readings:
Computational
Methods in Authorship Attribution
M. Koppel, J. Schler, S. Argamon
Lecture 4: Authorship verification
Slides : Lecture4
Readings:
Measuring
Differentiability: Unmasking Pseudonymous Authors
M. Koppel, J. Schler, E. Bonchek-Dokow
Authorship
Attribution in the Wild
M. Koppel, J. Schler, S. Argamon
Lecture 5: Author profiling
Slides : Lecture5
Readings:
Determining
an Author's Native Language by Mining a Text for Errors
M.
Koppel, J. Schler, K. Zigdon
Automatically
Profiling the Author of an Anonymous Text
S. Argamon, M. Koppel, J. Pennebaker and J. Schler
Lecture 6. Bottom-up sentiment analysis
Slides : Lecture6
Readings:
Predicting
the semantic orientation of adjectives
V
Hatzivassiloglou, KR McKeown
Thumbs
Up or Thumbs Down?
P
Turney
Recognizing
Contextual Polarity in Phrase-Level Sentiment Analysis
T Wilson, J Wiebe, P
Hoffmann
Lecture 7. Top-down sentiment analysis
Slides : Lecture7
Readings:
Opinion Mining and
Sentiment Analysis
Mining the peanut gallery:
opinion extraction and semantic classification of product reviews
K Dave, S Lawrence, DM Pennock
The
Importance of Neutral Examples for Learning Sentiment
M Koppel, J
Schler
Lecture 8. Spam filtering
Slides : Lecture8
Lecture 9. Text clustering
Slides: Lecture9
Readings:
Introduction
to Information Retrieval (Chapter 16)
C Manning, P Raghavan, H
Schutze
On
Spectral Clustering: Analysis and an Algorithm
Lecture 10. Latent semantic analysis
Slides: Lecture10
Readings:
Introduction
to Information Retrieval (Chapter 18)
C Manning, P Raghavan, H
Schutze
Indexing by
latent semantic analysis
S Deerwester, ST Dumais, GW Furnas, TK Landauer