com.wcohen.secondstring
Interface StringDistance

All Known Implementing Classes:
AbstractStringDistance

public interface StringDistance

Compute the difference between pairs of strings.

For some types of distances, it is fine to simply create a StringDistance object and then use it, e.g., new JaroWinkler().compare("frederic", "fredrick").

Other string metrics benefit from caching information about a string, especially when many comparisons are many concerning the same string. The prepare() method returns a StringWrapper object, which can cache any appropriate information about the String it 'wraps'. The most frequent use of caching here is saving a tokenized version of a string (as a BagOfTokens, which is a subclass of StringWrapper.)

Metrics like TFIDF discount matches on frequent tokens. These work best if given a set of strings over which statistics can be accumulated. The accumulateStatistics() method is how this is done.


Method Summary
 void accumulateStatistics(java.util.Iterator i)
          Accumulate statistics over a set of stringWrappers, which will be produced by an iterator.
 java.lang.String explainScore(java.lang.String s, java.lang.String t)
          Explain how the distance was computed.
 java.lang.String explainScore(StringWrapper s, StringWrapper t)
          Explain how the distance was computed.
 StringWrapper prepare(java.lang.String s)
          Preprocess a string for distance computation
 double score(java.lang.String s, java.lang.String t)
          Find the distance between s and t
 double score(StringWrapper s, StringWrapper t)
          Find the distance between s and t.
 

Method Detail

score

public double score(StringWrapper s,
                    StringWrapper t)
Find the distance between s and t. Larger values indicate more similar strings.


score

public double score(java.lang.String s,
                    java.lang.String t)
Find the distance between s and t


prepare

public StringWrapper prepare(java.lang.String s)
Preprocess a string for distance computation


explainScore

public java.lang.String explainScore(StringWrapper s,
                                     StringWrapper t)
Explain how the distance was computed.


explainScore

public java.lang.String explainScore(java.lang.String s,
                                     java.lang.String t)
Explain how the distance was computed.


accumulateStatistics

public void accumulateStatistics(java.util.Iterator i)
Accumulate statistics over a set of stringWrappers, which will be produced by an iterator. This is for distance metrics like TFIDF that use statistics on unlabeled strings to adjust a distance metric.