com.wcohen.ss
Class JensenShannonDistance

java.lang.Object
  extended by com.wcohen.ss.AbstractStringDistance
      extended by com.wcohen.ss.AbstractTokenizedStringDistance
          extended by com.wcohen.ss.JensenShannonDistance
All Implemented Interfaces:
StringDistance, StringDistanceLearner
Direct Known Subclasses:
DirichletJS, JelinekMercerJS, UnsmoothedJS

public abstract class JensenShannonDistance
extends AbstractTokenizedStringDistance

Distance metrics based on Jensen-Shannon distance of two smoothed unigram language models.


Field Summary
 
Fields inherited from class com.wcohen.ss.AbstractTokenizedStringDistance
tokenizer
 
Constructor Summary
JensenShannonDistance()
           
JensenShannonDistance(Tokenizer tokenizer)
           
 
Method Summary
protected  double backgroundProb(Token tok)
          Probability of token in the background language model
 java.lang.String explainScore(StringWrapper s, StringWrapper t)
          This method needs to be implemented by subclasses.
 StringWrapper prepare(java.lang.String s)
          Preprocess a string by finding tokens and giving them weights W such that W is the smoothed probability of the token appearing in the document.
 double score(StringWrapper s, StringWrapper t)
          Jensen-Shannon distance between distributions.
protected abstract  double smoothedProbability(Token tok, double freq, double totalWeight)
          Smoothed probability of the token with frequency freq in a bag with the given totalWeight
 void train(StringWrapperIterator i)
          Accumulate statistics on how often each token occurs.
 
Methods inherited from class com.wcohen.ss.AbstractTokenizedStringDistance
asBagOfTokens, prepare, setStringWrapperPool
 
Methods inherited from class com.wcohen.ss.AbstractStringDistance
addExample, doMain, explainScore, getDistance, hasNextQuery, nextQuery, prepare, score, setDistanceInstancePool
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JensenShannonDistance

public JensenShannonDistance(Tokenizer tokenizer)

JensenShannonDistance

public JensenShannonDistance()
Method Detail

train

public final void train(StringWrapperIterator i)
Accumulate statistics on how often each token occurs.

Specified by:
train in class AbstractTokenizedStringDistance

prepare

public final StringWrapper prepare(java.lang.String s)
Preprocess a string by finding tokens and giving them weights W such that W is the smoothed probability of the token appearing in the document.

Specified by:
prepare in interface StringDistance
Overrides:
prepare in class AbstractStringDistance

smoothedProbability

protected abstract double smoothedProbability(Token tok,
                                              double freq,
                                              double totalWeight)
Smoothed probability of the token with frequency freq in a bag with the given totalWeight


backgroundProb

protected double backgroundProb(Token tok)
Probability of token in the background language model


score

public final double score(StringWrapper s,
                          StringWrapper t)
Jensen-Shannon distance between distributions.

Specified by:
score in interface StringDistance
Specified by:
score in class AbstractStringDistance

explainScore

public final java.lang.String explainScore(StringWrapper s,
                                           StringWrapper t)
Description copied from class: AbstractStringDistance
This method needs to be implemented by subclasses.

Specified by:
explainScore in interface StringDistance
Specified by:
explainScore in class AbstractStringDistance