Gal A. Kaminka: Publications

Sorted by DateClassified by Publication TypeClassified by TopicGrouped by Student (current)Grouped by Former Students

REEF: Resolving Length Bias in Frequency Sequence Mining

Ariella Richardson, Gal A. Kaminka, and Sarit Kraus. REEF: Resolving Length Bias in Frequency Sequence Mining. In The Third International Conference on Advances in Information Mining and Management (IMMM-2013), 2013. Winner: Best paper award.

Download

[PDF]96.1kB  

Abstract

Classic support based approaches efficiently address frequent sequence mining. However, support based mining has been shown to suffer from a bias towards short sequences. In this paper, we propose a method to resolve this bias when mining the most frequent sequences. In order to resolve the length bias we define norm-frequency, based on the statistical z-score of support, and use it to replace support based frequency. Our approach mines the subsequences that are frequent relative to other subsequences of the same length. Unfortunately, naive use of norm-frequency hinders mining scalability. Using norm-frequency breaks the anti-monotonic property of support, an important part in being able to prune large sets of candidate sequences. We describe a bound that enables pruning to provide scalability. Experimental results on textual and computer user input data establish that we manage to overcome the short sequence bias successfully, and to illustrate the production of meaningful sequences with our mining algorithm.

BibTeX

@InProceedings{immm13,
 author = {Ariella Richardson and Gal A. Kaminka and Sarit Kraus},
 title =  {{REEF}: Resolving Length Bias in Frequency Sequence Mining},
 booktitle = {The Third International Conference on Advances in Information Mining and Management ({IMMM}-2013)},
 year = {2013},
 abstract = {Classic support based approaches efficiently address frequent sequence mining.  
However, support based mining has been shown to suffer from a bias towards short sequences. 
In this paper, we propose a method to resolve this bias when mining the most frequent 
sequences. In order to resolve the length bias we define norm-frequency, based on the 
statistical z-score of support, and use it to replace support based frequency. 
Our approach mines the subsequences that are frequent relative 
to other subsequences of the same length. Unfortunately, naive use of norm-frequency 
hinders mining scalability. Using norm-frequency breaks the anti-monotonic property of support, 
an important part in being able to prune large sets of candidate sequences. 
We describe a bound that enables pruning to provide 
scalability. Experimental results on textual and computer user 
input data establish that we manage to overcome the short 
sequence bias successfully, and to illustrate the production of 
meaningful sequences with our mining algorithm.},
 note = {{\bf Winner: Best paper award.}},
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Aug 30, 2024 17:29:52