com.wcohen.ss
Class AbstractStatisticalTokenDistance
java.lang.Object
com.wcohen.ss.AbstractStringDistance
com.wcohen.ss.AbstractTokenizedStringDistance
com.wcohen.ss.AbstractStatisticalTokenDistance
- All Implemented Interfaces:
- StringDistance, StringDistanceLearner
- Direct Known Subclasses:
- Mixture, SoftTokenFelligiSunter, TagLink, TFIDF, TokenFelligiSunter
public abstract class AbstractStatisticalTokenDistance
- extends AbstractTokenizedStringDistance
Abstract token distance metric that uses frequency statistics.
Methods inherited from class com.wcohen.ss.AbstractStringDistance |
addExample, doMain, explainScore, explainScore, getDistance, hasNextQuery, nextQuery, prepare, prepare, score, score, setDistanceInstancePool |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
documentFrequency
protected java.util.Map<Token,java.lang.Integer> documentFrequency
collectionSize
protected int collectionSize
totalTokenCount
protected int totalTokenCount
AbstractStatisticalTokenDistance
public AbstractStatisticalTokenDistance(Tokenizer tokenizer)
AbstractStatisticalTokenDistance
public AbstractStatisticalTokenDistance()
train
public void train(StringWrapperIterator i)
- Accumulate statistics on how often each token value occurs
- Specified by:
train
in class AbstractTokenizedStringDistance
checkTrainingHasHappened
protected void checkTrainingHasHappened(StringWrapper s,
StringWrapper t)
getDocumentFrequency
public int getDocumentFrequency(Token tok)