com.wcohen.secondstring
Class JensenShannonDistance

java.lang.Object
  |
  +--com.wcohen.secondstring.AbstractStringDistance
        |
        +--com.wcohen.secondstring.JensenShannonDistance
All Implemented Interfaces:
StringDistance
Direct Known Subclasses:
DirichletJS, JelinekMercerJS, UnsmoothedJS

public abstract class JensenShannonDistance
extends AbstractStringDistance

Distance metrics based on Jensen-Shannon distance of two smoothed unigram language models.


Constructor Summary
JensenShannonDistance()
           
JensenShannonDistance(Tokenizer tokenizer)
           
 
Method Summary
 void accumulateStatistics(java.util.Iterator i)
          Accumulate statistics on how often each token occurs.
protected  double backgroundProb(Token tok)
          Probability of token in the background language model
 java.lang.String explainScore(StringWrapper s, StringWrapper t)
          This method needs to be implemented by subclasses.
 StringWrapper prepare(java.lang.String s)
          Preprocess a string by finding tokens and giving them weights W such that W is the smoothed probability of the token appearing in the document.
 double score(StringWrapper s, StringWrapper t)
          Jensen-Shannon distance between distributions.
protected abstract  double smoothedProbability(Token tok, double freq, double totalWeight)
          Smoothed probability of the token with frequency freq in a bag with the given totalWeight
 
Methods inherited from class com.wcohen.secondstring.AbstractStringDistance
doMain, explainScore, score
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JensenShannonDistance

public JensenShannonDistance(Tokenizer tokenizer)

JensenShannonDistance

public JensenShannonDistance()
Method Detail

accumulateStatistics

public final void accumulateStatistics(java.util.Iterator i)
Accumulate statistics on how often each token occurs.

Specified by:
accumulateStatistics in interface StringDistance
Overrides:
accumulateStatistics in class AbstractStringDistance

prepare

public final StringWrapper prepare(java.lang.String s)
Preprocess a string by finding tokens and giving them weights W such that W is the smoothed probability of the token appearing in the document.

Specified by:
prepare in interface StringDistance
Overrides:
prepare in class AbstractStringDistance

smoothedProbability

protected abstract double smoothedProbability(Token tok,
                                              double freq,
                                              double totalWeight)
Smoothed probability of the token with frequency freq in a bag with the given totalWeight


backgroundProb

protected double backgroundProb(Token tok)
Probability of token in the background language model


score

public final double score(StringWrapper s,
                          StringWrapper t)
Jensen-Shannon distance between distributions.

Specified by:
score in interface StringDistance
Specified by:
score in class AbstractStringDistance

explainScore

public final java.lang.String explainScore(StringWrapper s,
                                           StringWrapper t)
Description copied from class: AbstractStringDistance
This method needs to be implemented by subclasses.

Specified by:
explainScore in interface StringDistance
Specified by:
explainScore in class AbstractStringDistance