com.wcohen.secondstring.tokens
Class SimpleTokenizer
java.lang.Object
|
+--com.wcohen.secondstring.tokens.SimpleTokenizer
- All Implemented Interfaces:
- Tokenizer
- public class SimpleTokenizer
- extends java.lang.Object
- implements Tokenizer
Simple implementation of a Tokenizer. Tokens are sequences of
alphanumerics, optionally including single punctuation characters.
Constructor Summary |
SimpleTokenizer(boolean ignorePunctuation,
boolean ignoreCase)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
DEFAULT_TOKENIZER
public static final SimpleTokenizer DEFAULT_TOKENIZER
SimpleTokenizer
public SimpleTokenizer(boolean ignorePunctuation,
boolean ignoreCase)
setIgnorePunctuation
public void setIgnorePunctuation(boolean flag)
setIgnoreCase
public void setIgnoreCase(boolean flag)
toString
public java.lang.String toString()
- Overrides:
toString
in class java.lang.Object
tokenize
public Token[] tokenize(java.lang.String input)
- Return tokenized version of a string. Tokens are sequences
of alphanumerics, or any single punctuation character.
- Specified by:
tokenize
in interface Tokenizer
intern
public Token intern(java.lang.String s)
- Description copied from interface:
Tokenizer
- Convert a given string into a token
- Specified by:
intern
in interface Tokenizer
main
public static void main(java.lang.String[] argv)
- Test routine