com.wcohen.ss
Class SoftTFIDF
java.lang.Object
   com.wcohen.ss.AbstractStringDistance
com.wcohen.ss.AbstractStringDistance
       com.wcohen.ss.AbstractTokenizedStringDistance
com.wcohen.ss.AbstractTokenizedStringDistance
           com.wcohen.ss.AbstractStatisticalTokenDistance
com.wcohen.ss.AbstractStatisticalTokenDistance
               com.wcohen.ss.TFIDF
com.wcohen.ss.TFIDF
                   com.wcohen.ss.SoftTFIDF
com.wcohen.ss.SoftTFIDF
- All Implemented Interfaces: 
- StringDistance, StringDistanceLearner
- Direct Known Subclasses: 
- JaroWinklerTFIDF
- public class SoftTFIDF 
- extends TFIDF
TFIDF-based distance metric, extended to use "soft" token-matching.
 Specifically, tokens are considered a partial match if they get
 a good score using an inner string comparator.
 
On the WHIRL datasets, thresholding JaroWinkler at 0.9 or 0.95
 seems to be about right.
 
 
 
 
 
 
 
 
 
 
 
| Methods inherited from class java.lang.Object | 
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait | 
 
SoftTFIDF
public SoftTFIDF(Tokenizer tokenizer,
                 StringDistance tokenDistance,
                 double tokenMatchThreshold)
SoftTFIDF
public SoftTFIDF(StringDistance tokenDistance,
                 double tokenMatchThreshold)
SoftTFIDF
public SoftTFIDF(StringDistance tokenDistance)
setTokenMatchThreshold
public void setTokenMatchThreshold(double d)
- 
 
setTokenMatchThreshold
public void setTokenMatchThreshold(java.lang.Double d)
- 
 
getTokenMatchThreshold
public double getTokenMatchThreshold()
- 
 
score
public double score(StringWrapper s,
                    StringWrapper t)
- Description copied from class: AbstractStringDistance
- This method needs to be implemented by subclasses.
 
- 
- Specified by:
- scorein interface- StringDistance
- Overrides:
- scorein class- TFIDF
 
- 
 
explainScore
public java.lang.String explainScore(StringWrapper s,
                                     StringWrapper t)
- Explain how the distance was computed. 
 In the output, the tokens in S and T are listed, and the
 common tokens are marked with an asterisk.
 
- 
- Specified by:
- explainScorein interface- StringDistance
- Overrides:
- explainScorein class- TFIDF
 
- 
 
toString
public java.lang.String toString()
- 
- Overrides:
- toStringin class- TFIDF
 
-