com.wcohen.secondstring
Class AbstractStatisticalTokenDistance

java.lang.Object
  |
  +--com.wcohen.secondstring.AbstractStringDistance
        |
        +--com.wcohen.secondstring.AbstractStatisticalTokenDistance
All Implemented Interfaces:
StringDistance
Direct Known Subclasses:
Mixture, SoftTokenFelligiSunter, TFIDF, TokenFelligiSunter

public abstract class AbstractStatisticalTokenDistance
extends AbstractStringDistance

Abstract token distance metric that uses frequency statistics.


Field Summary
protected  int collectionSize
           
protected  java.util.Map documentFrequency
           
protected  Tokenizer tokenizer
           
protected  int totalTokenCount
           
 
Constructor Summary
AbstractStatisticalTokenDistance()
           
AbstractStatisticalTokenDistance(Tokenizer tokenizer)
           
 
Method Summary
 void accumulateStatistics(java.util.Iterator i)
          Accumulate statistics on how often each token value occurs
 int getDocumentFrequency(Token tok)
           
 
Methods inherited from class com.wcohen.secondstring.AbstractStringDistance
doMain, explainScore, explainScore, prepare, score, score
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tokenizer

protected Tokenizer tokenizer

documentFrequency

protected java.util.Map documentFrequency

collectionSize

protected int collectionSize

totalTokenCount

protected int totalTokenCount
Constructor Detail

AbstractStatisticalTokenDistance

public AbstractStatisticalTokenDistance(Tokenizer tokenizer)

AbstractStatisticalTokenDistance

public AbstractStatisticalTokenDistance()
Method Detail

accumulateStatistics

public void accumulateStatistics(java.util.Iterator i)
Accumulate statistics on how often each token value occurs

Specified by:
accumulateStatistics in interface StringDistance
Overrides:
accumulateStatistics in class AbstractStringDistance

getDocumentFrequency

public int getDocumentFrequency(Token tok)