|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectca.uottawa.balie.TokenList
public class TokenList
List of Tokens to represent a text. Comes with a bunch of manipulation functions. Also an XML representation.
| Constructor Summary | |
|---|---|
TokenList(boolean pi_DetectSentenceBoundaries,
NamedEntityTypeEnumI[] pi_Types)
Construct an empty TokenList. |
|
| Method Summary | |
|---|---|
boolean |
Add(Token pi_Token,
SentenceBoundariesRecognition pi_SBR,
WekaLearner pi_SBRModel)
Add a token a the end of the TokenList. |
boolean |
equals(java.lang.Object pi_Obj)
|
Token |
Get(int pi_Index)
Gets the token at the given index. |
int |
getSentenceCount()
Gets the number of sentences found. |
java.util.Hashtable |
HashAccess()
Get the map index-to-token |
int |
hashCode()
|
TokenListIterator |
Iterator()
Gets an iterator for the tokenList |
void |
MapNewNETypes(NamedEntityTypeEnumI[] pi_Mapping)
Map new NE types. |
NamedEntityTypeEnumI[] |
NETagSet()
Get the current NE tag set |
java.lang.String |
SentenceText(int pi_Index,
boolean pi_Canonic,
boolean pi_PrintNewLines)
Gets the text version of the sentence at the given index. |
void |
SetEntityType(int pi_Index,
NamedEntityType pi_Type)
Set the type of an entity (deep copy) |
void |
SetPOS(int pi_Index,
int pi_POS)
Sets the Part-of-speech of the token at the given index. |
int |
Size()
Gets the size (number of tokens) of the TokenList. |
java.util.Hashtable<java.lang.String,java.lang.Double> |
TermFrequencyTable()
Gets the TF table. |
java.lang.String |
TokenRangeText(int pi_Start,
int pi_Stop,
boolean pi_Canonic,
boolean pi_PrintNewLines,
boolean pi_TagEntities,
boolean pi_AddAlias,
boolean pi_AddExplanation,
boolean pi_EscapeXML)
Get String representation of a part of the tokenlist |
java.lang.StringBuffer |
ToXML()
Gets the tokenlist in XML format |
java.util.ArrayList<java.lang.String> |
WordList()
Get the (ordered) list of words in this tokenlist |
| Methods inherited from class java.lang.Object |
|---|
getClass, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public TokenList(boolean pi_DetectSentenceBoundaries,
NamedEntityTypeEnumI[] pi_Types)
pi_DetectSentenceBoundaries - True if the sentences boundaries must be detected| Method Detail |
|---|
public boolean Add(Token pi_Token,
SentenceBoundariesRecognition pi_SBR,
WekaLearner pi_SBRModel)
pi_Token - A new tokenpi_SBR - The SBR objectpi_SBRModel - The learned SBR model
public int Size()
public Token Get(int pi_Index)
pi_Index - Index of the token to get.
public boolean equals(java.lang.Object pi_Obj)
equals in class java.lang.Objectpublic int hashCode()
hashCode in class java.lang.Object
public java.lang.String SentenceText(int pi_Index,
boolean pi_Canonic,
boolean pi_PrintNewLines)
pi_Index - Index of the sentence to get (in number of sentences)pi_Canonic - True if the text must be returned in its canonical versionpi_PrintNewLines - Print \n characters
public java.lang.String TokenRangeText(int pi_Start,
int pi_Stop,
boolean pi_Canonic,
boolean pi_PrintNewLines,
boolean pi_TagEntities,
boolean pi_AddAlias,
boolean pi_AddExplanation,
boolean pi_EscapeXML)
pi_Start - Start token number (inclusive)pi_Stop - end token number (exclusive)pi_Canonic - print in canonical (lowercased, etc) formpi_PrintNewLines - print \n characterspi_TagEntities - add XML tags around named entitiespi_AddAlias - add alias network infos in XML tagpi_AddExplanation - add explanations infos in XML tagpi_EscapeXML - escape XML reserved characters in the text (so that output is valid XML)
public void MapNewNETypes(NamedEntityTypeEnumI[] pi_Mapping)
pi_Mapping - public NamedEntityTypeEnumI[] NETagSet()
public int getSentenceCount()
public java.util.Hashtable<java.lang.String,java.lang.Double> TermFrequencyTable()
public java.util.Hashtable HashAccess()
public java.util.ArrayList<java.lang.String> WordList()
public void SetPOS(int pi_Index,
int pi_POS)
pi_Index - Index of the token to updatepi_POS - Part-of-speech of this token (see TokenConsts for the enumeration)TokenConsts
public void SetEntityType(int pi_Index,
NamedEntityType pi_Type)
pi_Index - index of the entitypi_Type - type to setpublic java.lang.StringBuffer ToXML()
public TokenListIterator Iterator()
TokenListIterator)TokenListIterator
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||