|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.wcohen.ss.tokens.NGramTokenizer
public class NGramTokenizer
Wraps another tokenizer, and adds all computes all ngrams of characters from a single token produced by the inner tokenizer.
| Field Summary | |
|---|---|
static NGramTokenizer |
DEFAULT_TOKENIZER
|
| Constructor Summary | |
|---|---|
NGramTokenizer(int minNGramSize,
int maxNGramSize,
boolean keepOldTokens,
Tokenizer innerTokenizer)
|
|
| Method Summary | |
|---|---|
Token |
intern(java.lang.String s)
Convert a given string into a token. |
static void |
main(java.lang.String[] argv)
Test routine |
int |
maxTokenIndex()
Return the higest index of any interned token |
java.util.Iterator<Token> |
tokenIterator()
Return an iterator over interned tokens |
Token[] |
tokenize(java.lang.String input)
Return tokenized version of a string. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static NGramTokenizer DEFAULT_TOKENIZER
| Constructor Detail |
|---|
public NGramTokenizer(int minNGramSize,
int maxNGramSize,
boolean keepOldTokens,
Tokenizer innerTokenizer)
| Method Detail |
|---|
public Token[] tokenize(java.lang.String input)
tokenize in interface Tokenizerpublic Token intern(java.lang.String s)
Tokenizer
intern in interface Tokenizerpublic java.util.Iterator<Token> tokenIterator()
Tokenizer
tokenIterator in interface Tokenizerpublic int maxTokenIndex()
Tokenizer
maxTokenIndex in interface Tokenizerpublic static void main(java.lang.String[] argv)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||