com.wcohen.ss.api
Interface Tokenizer

All Known Implementing Classes:
NGramTokenizer, SimpleTokenizer

public interface Tokenizer

Split a string into tokens.


Method Summary
 Token intern(java.lang.String s)
          Convert a given string into a token.
 int maxTokenIndex()
          Return the higest index of any interned token
 java.util.Iterator<Token> tokenIterator()
          Return an iterator over interned tokens
 Token[] tokenize(java.lang.String input)
          Return tokenized version of a string
 

Method Detail

tokenize

Token[] tokenize(java.lang.String input)
Return tokenized version of a string


intern

Token intern(java.lang.String s)
Convert a given string into a token. The intern function should have these properties: (1) If s1.equals(s2), then intern(s1)==intern(s2). (2) If no string equal to s1 has ever been interned before, then intern(s1).getIndex() will be larger than every previously-assigned index--i.e, token 'indexes' are assigned in increasing order.


tokenIterator

java.util.Iterator<Token> tokenIterator()
Return an iterator over interned tokens


maxTokenIndex

int maxTokenIndex()
Return the higest index of any interned token