ca.uottawa.balie
Class Canonizer

java.lang.Object
  extended by ca.uottawa.balie.Canonizer

public class Canonizer
extends java.lang.Object

Offers static function to convert a word into its canon form.

Author:
nadeaud

Field Summary
static int RULE_EXPAND_LIGATURES
          Changes ligature character in their multi-letter equivalent.
static int RULE_LOWERCASE
          Puts every letters in lowercase
static int RULE_NORMALIZE_PUNCT
          Normalizes punctuation (e.g.: all unicode quotes are resolved to ")
static int RULE_REMOVE_INTERNAL_PUNCT
          Removes internal punctuation, that is punctuation after the first letter and before the last letter.
static int RULE_STRIP_ACCENTS
          Strip Accents.
 
Constructor Summary
Canonizer()
           
 
Method Summary
static java.lang.String CanonForm(java.lang.String pi_Raw, int pi_Rules, PunctLookup pi_PunctLookup, LigatureLookup pi_LigatureLookup, AccentLookup pi_AccentLookup)
          Transforms a word in its canon version.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

RULE_LOWERCASE

public static final int RULE_LOWERCASE
Puts every letters in lowercase

See Also:
Constant Field Values

RULE_NORMALIZE_PUNCT

public static final int RULE_NORMALIZE_PUNCT
Normalizes punctuation (e.g.: all unicode quotes are resolved to ")

See Also:
Constant Field Values

RULE_REMOVE_INTERNAL_PUNCT

public static final int RULE_REMOVE_INTERNAL_PUNCT
Removes internal punctuation, that is punctuation after the first letter and before the last letter. Punctuations are defined in the PunctLookup.

See Also:
PunctLookup, Constant Field Values

RULE_EXPAND_LIGATURES

public static final int RULE_EXPAND_LIGATURES
Changes ligature character in their multi-letter equivalent.

See Also:
Constant Field Values

RULE_STRIP_ACCENTS

public static final int RULE_STRIP_ACCENTS
Strip Accents.

See Also:
Constant Field Values
Constructor Detail

Canonizer

public Canonizer()
Method Detail

CanonForm

public static java.lang.String CanonForm(java.lang.String pi_Raw,
                                         int pi_Rules,
                                         PunctLookup pi_PunctLookup,
                                         LigatureLookup pi_LigatureLookup,
                                         AccentLookup pi_AccentLookup)
Transforms a word in its canon version.

Parameters:
pi_Raw - The raw version of the word
pi_Rules - The rules to apply using a bitwise OR (ex.: RULE_EXPAND_LIGATURES | RULE_REMOVE_INTERNAL_PUNCT)
pi_PunctLookup - The punctuation lookup
pi_LigatureLookup - The ligature lookup
Returns:
The canon word, after applying requested rules