com.wcohen.secondstring
Class AbstractStatisticalTokenDistance
java.lang.Object
|
+--com.wcohen.secondstring.AbstractStringDistance
|
+--com.wcohen.secondstring.AbstractStatisticalTokenDistance
- All Implemented Interfaces:
- StringDistance
- Direct Known Subclasses:
- Mixture, SoftTokenFelligiSunter, TFIDF, TokenFelligiSunter
- public abstract class AbstractStatisticalTokenDistance
- extends AbstractStringDistance
Abstract token distance metric that uses frequency statistics.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
tokenizer
protected Tokenizer tokenizer
documentFrequency
protected java.util.Map documentFrequency
collectionSize
protected int collectionSize
totalTokenCount
protected int totalTokenCount
AbstractStatisticalTokenDistance
public AbstractStatisticalTokenDistance(Tokenizer tokenizer)
AbstractStatisticalTokenDistance
public AbstractStatisticalTokenDistance()
accumulateStatistics
public void accumulateStatistics(java.util.Iterator i)
- Accumulate statistics on how often each token value occurs
- Specified by:
accumulateStatistics
in interface StringDistance
- Overrides:
accumulateStatistics
in class AbstractStringDistance
getDocumentFrequency
public int getDocumentFrequency(Token tok)