com.wcohen.ss
Class SoftTFIDF
java.lang.Object
com.wcohen.ss.AbstractStringDistance
com.wcohen.ss.AbstractTokenizedStringDistance
com.wcohen.ss.AbstractStatisticalTokenDistance
com.wcohen.ss.TFIDF
com.wcohen.ss.SoftTFIDF
- All Implemented Interfaces:
- StringDistance, StringDistanceLearner
- Direct Known Subclasses:
- JaroWinklerTFIDF
public class SoftTFIDF
- extends TFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
Specifically, tokens are considered a partial match if they get
a good score using an inner string comparator.
On the WHIRL datasets, thresholding JaroWinkler at 0.9 or 0.95
seems to be about right.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
SoftTFIDF
public SoftTFIDF(Tokenizer tokenizer,
StringDistance tokenDistance,
double tokenMatchThreshold)
SoftTFIDF
public SoftTFIDF(StringDistance tokenDistance,
double tokenMatchThreshold)
SoftTFIDF
public SoftTFIDF(StringDistance tokenDistance)
setTokenMatchThreshold
public void setTokenMatchThreshold(double d)
setTokenMatchThreshold
public void setTokenMatchThreshold(java.lang.Double d)
getTokenMatchThreshold
public double getTokenMatchThreshold()
score
public double score(StringWrapper s,
StringWrapper t)
- Description copied from class:
AbstractStringDistance
- This method needs to be implemented by subclasses.
- Specified by:
score
in interface StringDistance
- Overrides:
score
in class TFIDF
explainScore
public java.lang.String explainScore(StringWrapper s,
StringWrapper t)
- Explain how the distance was computed.
In the output, the tokens in S and T are listed, and the
common tokens are marked with an asterisk.
- Specified by:
explainScore
in interface StringDistance
- Overrides:
explainScore
in class TFIDF
toString
public java.lang.String toString()
- Overrides:
toString
in class TFIDF