ca.uottawa.balie
Class CharacterNGram

java.lang.Object
  extended by ca.uottawa.balie.CharacterNGram

public class CharacterNGram
extends java.lang.Object

Methods to collect and handle character n-gram. A character n-gram is a sequence of n chars. For instance, in the word WORD, there are 3 bigrams: WO, OR and RD.

Author:
nadeaud

Constructor Summary
CharacterNGram(int pi_NGramSize)
          Creates a new n-gram handler.
 
Method Summary
 void Feed(java.lang.String pi_InString)
          Feed a text to the N-gram handler.
 java.lang.Double[] Instance(java.lang.String[] pi_RefNGrams)
          Creates an instance made of n-gram relative frequencies for a given set of reference n-grams.
 java.util.Hashtable<java.lang.String,java.lang.Integer> NGramFrequency()
          Get the table that associates each n-gram to its frequency.
 java.util.Hashtable<java.lang.String,java.lang.Integer> UNIGramFrequency()
          Get the Unigram table
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CharacterNGram

public CharacterNGram(int pi_NGramSize)
Creates a new n-gram handler.

Parameters:
pi_NGramSize - The value of N (must be at least 2, to extract bigrams)
Method Detail

Feed

public void Feed(java.lang.String pi_InString)
Feed a text to the N-gram handler. The handler reads the text and find n-grams. It also computes statistics.

Parameters:
pi_InString - The text to split in n-grams

Instance

public java.lang.Double[] Instance(java.lang.String[] pi_RefNGrams)
Creates an instance made of n-gram relative frequencies for a given set of reference n-grams. The relative frequency of a given n-gram is its frequency divided by the total number of n-gram. The reference n-gram list is a subset of the entire n-gram list, proper or not.

Parameters:
pi_RefNGrams - Reference n-grams for which the statistics are required.
Returns:
A parrallel array containing the relative frequency of the reference n-grams.

NGramFrequency

public java.util.Hashtable<java.lang.String,java.lang.Integer> NGramFrequency()
Get the table that associates each n-gram to its frequency.

Returns:
Hashtable for which the keys are n-gram and the values are frequencies

UNIGramFrequency

public java.util.Hashtable<java.lang.String,java.lang.Integer> UNIGramFrequency()
Get the Unigram table

Returns:
Hashtable for which the keys are uni-gram and the values are frequencies