|

Java Number Normalizer (Beta)
-
The main functionality of the Bar-Ilan University Normalizer is number normalization, i.e. a conversion from textual representations of numbers to a standard numerical string.
For instance, "three thousand" is normalized to "3000" and "$134M" to "134000000 dollars" (see more examples below).
The tool also includes:
- String normalization: a simple string replacement per user-defined rules (e.g. "'ll" → "will").
- Date & time normalization (e.g. "October 1, 2002" → "1/10/2002", ).
Normalization may help (pre)processing tools such as parsers or coreference resolvers.
This normalizer is being used in Bar-Ilan University's Textual Entailment engine (BiuTee) as a first preprocessing step, and has been used in a couple of additional works.
-
Download the latest version (v0.6.1, Aug 31, 2011)
-
License
- Contact: For bug reports, suggestions and questions, please contact:
Shachar Mirkin, shacharmirkin @ gmail.com
- More normalization examples:
- 23 May, 2011 → 23/5/2011
- 12.00 → 12
- fifteen hundred → 1500
- minus 1 → -1
- $.5 million → 500000 dollars
-
- $5.54 billion (€4.39 billion) → 5540000000 dollars (4390000000 euros)
- seven hundred and fifty six million three hundred and fifteen thousand two hundred and fifty five → 756315255
|
|