ca.uottawa.balie
Class Token

java.lang.Object
  extended by ca.uottawa.balie.Token
All Implemented Interfaces:
java.io.Serializable

public class Token
extends java.lang.Object
implements java.io.Serializable

Tokens are the unit element of Balie. A text is represneted as a list of consecutives tokens (called TokenList).

Author:
nadeaud
See Also:
Serialized Form

Constructor Summary
Token(java.lang.String pi_RawLiteral, java.lang.String pi_CanonLiteral, int pi_Type, PunctLookup pi_PunctLookup, AccentLookup pi_AccentLookup, int pi_Position, int pi_Sentence, int pi_NumWhiteBefore, int pi_NextStart, int pi_NETagSetSize)
          Creates a new token with all the required information.
 
Method Summary
 java.lang.String Canon()
          Gets the canonical version of the token.
 int EndPos()
           
 NamedEntityType EntityType()
          Get the entity type of this token see NamedEntityType for enumeration of types.
 void EntityType(NamedEntityType pi_Type)
          Set the entity type see TokenConsts for enumeration of types.
 boolean equals(java.lang.Object pi_Obj)
           
 TokenFeature Features()
          Get features for this token
 void FlagAsAllCapSentence()
           
 void FlagAsSentenceStart()
           
 int hashCode()
           
 void IncrementSentenceNumber()
          Increments the sentence number of a token.
 boolean IsAllCapSentence()
           
 boolean IsSentenceStart()
           
 int Length()
          Gets the lenght of a token in number fo chars.
 int NamedEntityAlias()
          Get the alias group (integer ID) for this token
 void NamedEntityAlias(int pi_ID)
          Set alias group ID for this token
 int NumWhiteBefore()
          Get the number of white spaces that preceed this token in the text
 int PartOfSpeech()
          Gets the part-of-speech of the token.
 long Position()
          Gets the token position.
 java.lang.String Raw()
          Gets the raw version of the token.
 int SentenceNumber()
          Gets the sentence number.
 void setPosition(int numPosition)
          Sets the token position.
 void setSentenceNumber(int numSentence)
          Sets the sentence number.
 int StartPos()
           
 java.lang.String toString()
          A canonical string representation of this token.
 java.lang.StringBuffer ToXML()
          Gets the XML representation of the token.
 int Type()
          Gets the type of the token (word or punctuation).
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Token

public Token(java.lang.String pi_RawLiteral,
             java.lang.String pi_CanonLiteral,
             int pi_Type,
             PunctLookup pi_PunctLookup,
             AccentLookup pi_AccentLookup,
             int pi_Position,
             int pi_Sentence,
             int pi_NumWhiteBefore,
             int pi_NextStart,
             int pi_NETagSetSize)
Creates a new token with all the required information.

Parameters:
pi_RawLiteral - The word as it appears in the text
pi_CanonLiteral - The canonical version of the word
pi_Type - The type (punctuation or word) see TokenConsts for details
pi_PunctLookup - The lookup table for punctuation types
pi_Position - The position of the token, in number of words from the text beginning
pi_Sentence - The sentence number
pi_NumWhiteBefore - Number fo white chars prior to this token
pi_NextStart - Start position (in chars) of this token (including white)
Method Detail

Raw

public java.lang.String Raw()
Gets the raw version of the token.

Returns:
String

Canon

public java.lang.String Canon()
Gets the canonical version of the token.

Returns:
String

Type

public int Type()
Gets the type of the token (word or punctuation). see TokenConsts for enumeration.

Returns:
The type of the token
See Also:
TokenConsts

PartOfSpeech

public int PartOfSpeech()
Gets the part-of-speech of the token. Words and punctuations have a POS. see TokenConsts for enumeration of both.

Returns:
the POS
See Also:
TokenConsts

equals

public boolean equals(java.lang.Object pi_Obj)
Overrides:
equals in class java.lang.Object

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

NumWhiteBefore

public int NumWhiteBefore()
Get the number of white spaces that preceed this token in the text

Returns:
num white chars

EntityType

public NamedEntityType EntityType()
Get the entity type of this token see NamedEntityType for enumeration of types.

Returns:
entity type
See Also:
TokenConsts

EntityType

public void EntityType(NamedEntityType pi_Type)
Set the entity type see TokenConsts for enumeration of types.

Parameters:
pi_Type -
See Also:
TokenConsts

NamedEntityAlias

public int NamedEntityAlias()
Get the alias group (integer ID) for this token

Returns:
group ID (-1 if the token does not belong to an alas group)

NamedEntityAlias

public void NamedEntityAlias(int pi_ID)
Set alias group ID for this token

Parameters:
pi_ID - alias group ID

SentenceNumber

public int SentenceNumber()
Gets the sentence number.

Returns:
The sentence number

setSentenceNumber

public void setSentenceNumber(int numSentence)
Sets the sentence number.

Parameters:
numSentence - the new sentence number

IncrementSentenceNumber

public void IncrementSentenceNumber()
Increments the sentence number of a token. Useful if using the SBR module that can identify sentence break on the late.


Position

public long Position()
Gets the token position.

Returns:
The token position.

setPosition

public void setPosition(int numPosition)
Sets the token position.

Parameters:
numPosition - the new token position

Length

public int Length()
Gets the lenght of a token in number fo chars.

Returns:
Token lenght

ToXML

public java.lang.StringBuffer ToXML()
Gets the XML representation of the token.

Returns:
An XML representation in a StringBuffer.

StartPos

public int StartPos()
Returns:
the start position in number of character relative to document start

EndPos

public int EndPos()
Returns:
the end position in number of character relative to document start

FlagAsSentenceStart

public void FlagAsSentenceStart()

FlagAsAllCapSentence

public void FlagAsAllCapSentence()

IsSentenceStart

public boolean IsSentenceStart()
Returns:
true if this token starts a sentence

IsAllCapSentence

public boolean IsAllCapSentence()
Returns:
true if this token is inside an all-capitalized sentence

toString

public java.lang.String toString()
A canonical string representation of this token.

Overrides:
toString in class java.lang.Object
See Also:
Object.toString()

Features

public TokenFeature Features()
Get features for this token

Returns:
the features