com.wcohen.secondstring.tokens
Class SimpleTokenizer

java.lang.Object
  |
  +--com.wcohen.secondstring.tokens.SimpleTokenizer
All Implemented Interfaces:
Tokenizer

public class SimpleTokenizer
extends java.lang.Object
implements Tokenizer

Simple implementation of a Tokenizer. Tokens are sequences of alphanumerics, optionally including single punctuation characters.


Field Summary
static SimpleTokenizer DEFAULT_TOKENIZER
           
 
Constructor Summary
SimpleTokenizer(boolean ignorePunctuation, boolean ignoreCase)
           
 
Method Summary
 Token intern(java.lang.String s)
          Convert a given string into a token
static void main(java.lang.String[] argv)
          Test routine
 void setIgnoreCase(boolean flag)
           
 void setIgnorePunctuation(boolean flag)
           
 Token[] tokenize(java.lang.String input)
          Return tokenized version of a string.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_TOKENIZER

public static final SimpleTokenizer DEFAULT_TOKENIZER
Constructor Detail

SimpleTokenizer

public SimpleTokenizer(boolean ignorePunctuation,
                       boolean ignoreCase)
Method Detail

setIgnorePunctuation

public void setIgnorePunctuation(boolean flag)

setIgnoreCase

public void setIgnoreCase(boolean flag)

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

tokenize

public Token[] tokenize(java.lang.String input)
Return tokenized version of a string. Tokens are sequences of alphanumerics, or any single punctuation character.

Specified by:
tokenize in interface Tokenizer

intern

public Token intern(java.lang.String s)
Description copied from interface: Tokenizer
Convert a given string into a token

Specified by:
intern in interface Tokenizer

main

public static void main(java.lang.String[] argv)
Test routine