com.wcohen.ss
Class AbstractStatisticalTokenDistance

java.lang.Object
  extended by com.wcohen.ss.AbstractStringDistance
      extended by com.wcohen.ss.AbstractTokenizedStringDistance
          extended by com.wcohen.ss.AbstractStatisticalTokenDistance
All Implemented Interfaces:
StringDistance, StringDistanceLearner
Direct Known Subclasses:
Mixture, SoftTokenFelligiSunter, TagLink, TFIDF, TokenFelligiSunter

public abstract class AbstractStatisticalTokenDistance
extends AbstractTokenizedStringDistance

Abstract token distance metric that uses frequency statistics.


Field Summary
protected  int collectionSize
           
protected  java.util.Map<Token,java.lang.Integer> documentFrequency
           
protected  int totalTokenCount
           
 
Fields inherited from class com.wcohen.ss.AbstractTokenizedStringDistance
tokenizer
 
Constructor Summary
AbstractStatisticalTokenDistance()
           
AbstractStatisticalTokenDistance(Tokenizer tokenizer)
           
 
Method Summary
protected  void checkTrainingHasHappened(StringWrapper s, StringWrapper t)
           
 int getDocumentFrequency(Token tok)
           
 void train(StringWrapperIterator i)
          Accumulate statistics on how often each token value occurs
 
Methods inherited from class com.wcohen.ss.AbstractTokenizedStringDistance
asBagOfTokens, prepare, setStringWrapperPool
 
Methods inherited from class com.wcohen.ss.AbstractStringDistance
addExample, doMain, explainScore, explainScore, getDistance, hasNextQuery, nextQuery, prepare, prepare, score, score, setDistanceInstancePool
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

documentFrequency

protected java.util.Map<Token,java.lang.Integer> documentFrequency

collectionSize

protected int collectionSize

totalTokenCount

protected int totalTokenCount
Constructor Detail

AbstractStatisticalTokenDistance

public AbstractStatisticalTokenDistance(Tokenizer tokenizer)

AbstractStatisticalTokenDistance

public AbstractStatisticalTokenDistance()
Method Detail

train

public void train(StringWrapperIterator i)
Accumulate statistics on how often each token value occurs

Specified by:
train in class AbstractTokenizedStringDistance

checkTrainingHasHappened

protected void checkTrainingHasHappened(StringWrapper s,
                                        StringWrapper t)

getDocumentFrequency

public int getDocumentFrequency(Token tok)