|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectca.uottawa.balie.TokenList
public class TokenList
List of Tokens to represent a text. Comes with a bunch of manipulation functions. Also an XML representation.
Constructor Summary | |
---|---|
TokenList(boolean pi_DetectSentenceBoundaries,
NamedEntityTypeEnumI[] pi_Types)
Construct an empty TokenList. |
Method Summary | |
---|---|
boolean |
Add(Token pi_Token,
SentenceBoundariesRecognition pi_SBR,
WekaLearner pi_SBRModel)
Add a token a the end of the TokenList. |
boolean |
equals(java.lang.Object pi_Obj)
|
Token |
Get(int pi_Index)
Gets the token at the given index. |
int |
getSentenceCount()
Gets the number of sentences found. |
java.util.Hashtable |
HashAccess()
Get the map index-to-token |
int |
hashCode()
|
TokenListIterator |
Iterator()
Gets an iterator for the tokenList |
void |
MapNewNETypes(NamedEntityTypeEnumI[] pi_Mapping)
Map new NE types. |
NamedEntityTypeEnumI[] |
NETagSet()
Get the current NE tag set |
java.lang.String |
SentenceText(int pi_Index,
boolean pi_Canonic,
boolean pi_PrintNewLines)
Gets the text version of the sentence at the given index. |
void |
SetEntityType(int pi_Index,
NamedEntityType pi_Type)
Set the type of an entity (deep copy) |
void |
SetPOS(int pi_Index,
int pi_POS)
Sets the Part-of-speech of the token at the given index. |
int |
Size()
Gets the size (number of tokens) of the TokenList. |
java.util.Hashtable<java.lang.String,java.lang.Double> |
TermFrequencyTable()
Gets the TF table. |
java.lang.String |
TokenRangeText(int pi_Start,
int pi_Stop,
boolean pi_Canonic,
boolean pi_PrintNewLines,
boolean pi_TagEntities,
boolean pi_AddAlias,
boolean pi_AddExplanation,
boolean pi_EscapeXML)
Get String representation of a part of the tokenlist |
java.lang.StringBuffer |
ToXML()
Gets the tokenlist in XML format |
java.util.ArrayList<java.lang.String> |
WordList()
Get the (ordered) list of words in this tokenlist |
Methods inherited from class java.lang.Object |
---|
getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TokenList(boolean pi_DetectSentenceBoundaries, NamedEntityTypeEnumI[] pi_Types)
pi_DetectSentenceBoundaries
- True if the sentences boundaries must be detectedMethod Detail |
---|
public boolean Add(Token pi_Token, SentenceBoundariesRecognition pi_SBR, WekaLearner pi_SBRModel)
pi_Token
- A new tokenpi_SBR
- The SBR objectpi_SBRModel
- The learned SBR model
public int Size()
public Token Get(int pi_Index)
pi_Index
- Index of the token to get.
public boolean equals(java.lang.Object pi_Obj)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public java.lang.String SentenceText(int pi_Index, boolean pi_Canonic, boolean pi_PrintNewLines)
pi_Index
- Index of the sentence to get (in number of sentences)pi_Canonic
- True if the text must be returned in its canonical versionpi_PrintNewLines
- Print \n characters
public java.lang.String TokenRangeText(int pi_Start, int pi_Stop, boolean pi_Canonic, boolean pi_PrintNewLines, boolean pi_TagEntities, boolean pi_AddAlias, boolean pi_AddExplanation, boolean pi_EscapeXML)
pi_Start
- Start token number (inclusive)pi_Stop
- end token number (exclusive)pi_Canonic
- print in canonical (lowercased, etc) formpi_PrintNewLines
- print \n characterspi_TagEntities
- add XML tags around named entitiespi_AddAlias
- add alias network infos in XML tagpi_AddExplanation
- add explanations infos in XML tagpi_EscapeXML
- escape XML reserved characters in the text (so that output is valid XML)
public void MapNewNETypes(NamedEntityTypeEnumI[] pi_Mapping)
pi_Mapping
- public NamedEntityTypeEnumI[] NETagSet()
public int getSentenceCount()
public java.util.Hashtable<java.lang.String,java.lang.Double> TermFrequencyTable()
public java.util.Hashtable HashAccess()
public java.util.ArrayList<java.lang.String> WordList()
public void SetPOS(int pi_Index, int pi_POS)
pi_Index
- Index of the token to updatepi_POS
- Part-of-speech of this token (see TokenConsts
for the enumeration)TokenConsts
public void SetEntityType(int pi_Index, NamedEntityType pi_Type)
pi_Index
- index of the entitypi_Type
- type to setpublic java.lang.StringBuffer ToXML()
public TokenListIterator Iterator()
TokenListIterator
)TokenListIterator
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |