Java Number Normalizer (Beta)


  • The main functionality of the Bar-Ilan University Normalizer is number normalization, i.e. a conversion from textual representations of numbers to a standard numerical string. For instance, "three thousand" is normalized to "3000" and "$134M" to "134000000 dollars" (see more examples below).

    The tool also includes:
    • String normalization: a simple string replacement per user-defined rules (e.g. "'ll" → "will").
    • Date & time normalization (e.g. "October 1, 2002" → "1/10/2002", ).
    Normalization may help (pre)processing tools such as parsers or coreference resolvers. This normalizer is being used in Bar-Ilan University's Textual Entailment engine (BiuTee) as a first preprocessing step, and has been used in a couple of additional works.
  • Download the latest version (v0.6.1, Aug 31, 2011)
  • License
  • Contact: For bug reports, suggestions and questions, please contact:
    Shachar Mirkin, shacharmirkin @ gmail.com

  • More normalization examples:
    23 May, 2011 → 23/5/2011
    12.00 → 12
    fifteen hundred → 1500
    minus 1 → -1
    $.5 million → 500000 dollars
    $5.54 billion (€4.39 billion) → 5540000000 dollars (4390000000 euros)
    seven hundred and fifty six million three hundred and fifteen thousand two hundred and fifty five → 756315255