|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.wcohen.ss.tokens.NGramTokenizer
public class NGramTokenizer
Wraps another tokenizer, and adds all computes all ngrams of characters from a single token produced by the inner tokenizer.
Field Summary | |
---|---|
static NGramTokenizer |
DEFAULT_TOKENIZER
|
Constructor Summary | |
---|---|
NGramTokenizer(int minNGramSize,
int maxNGramSize,
boolean keepOldTokens,
Tokenizer innerTokenizer)
|
Method Summary | |
---|---|
Token |
intern(java.lang.String s)
Convert a given string into a token. |
static void |
main(java.lang.String[] argv)
Test routine |
int |
maxTokenIndex()
Return the higest index of any interned token |
java.util.Iterator<Token> |
tokenIterator()
Return an iterator over interned tokens |
Token[] |
tokenize(java.lang.String input)
Return tokenized version of a string. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static NGramTokenizer DEFAULT_TOKENIZER
Constructor Detail |
---|
public NGramTokenizer(int minNGramSize, int maxNGramSize, boolean keepOldTokens, Tokenizer innerTokenizer)
Method Detail |
---|
public Token[] tokenize(java.lang.String input)
tokenize
in interface Tokenizer
public Token intern(java.lang.String s)
Tokenizer
intern
in interface Tokenizer
public java.util.Iterator<Token> tokenIterator()
Tokenizer
tokenIterator
in interface Tokenizer
public int maxTokenIndex()
Tokenizer
maxTokenIndex
in interface Tokenizer
public static void main(java.lang.String[] argv)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |