

PREV PACKAGE NEXT PACKAGE  FRAMES NO FRAMES 
See:
Description
Class Summary  

AbstractStatisticalTokenDistance  Abstract token distance metric that uses frequency statistics. 
AbstractStringDistance  Abstract class which implements StringDistanceLearner as well as StringDistance. 
AbstractTokenizedStringDistance  Abstract distance metric for tokenized strings. 
AdaptiveStringDistanceLearner  Abstract StringDistanceLearner class which averages results of a number of inner distance metrics, learned by a number of inner distance learners. 
AffineGap  Affinegap string distance, following Durban et al. 
ApproxMemoMatrix  Variant of MemoMatrix that only stores values near the diagonal, for better efficiency. 
ApproxNeedlemanWunsch  NeedlemanWunsch string distance, following Durban et al. 
AveragedStringDistanceLearner  Abstract StringDistanceLearner class which averages results of a number of inner distance metrics, learned by a number of inner distance learners. 
BasicDistanceInstanceIterator  A simple DistanceInstanceIterator implementation. 
BasicStringWrapper  An extendible (nonfinal) class that implements some of the functionality of a string. 
BasicStringWrapperIterator  A simple StringWrapperIterator implementation. 
CharMatchScore  Abstract distance between characters. 
CombinedStringDistanceLearner  Abstract StringDistanceLearner class which combines results of a number of inner distance metrics, learned by a number of inner distance learners. 
DirichletJS  JensenShannon distance of two unigram language models, smoothed using Dirichlet prior. 
DistanceLearnerFactory  Creates distance metric learners from string descriptions. 
Jaccard  Jaccard distance implementation. 
Jaro  Jaro distance metric. 
JaroWinkler  Jaro distance metric, as extended by Winkler. 
JaroWinklerTFIDF  Soft TFIDFbased distance metric, extended to use "soft" tokenmatching with the JaroWinkler distance metric. 
JelinekMercerJS  JensenShannon distance of two unigram language models, smoothed using JelinekMercer mixture model. 
JensenShannonDistance  Distance metrics based on JensenShannon distance of two smoothed unigram language models. 
Level2  Generic version of Monge & Elkan's "level 2" recursive field matching. 
Level2Jaro  "Level 2" recursive field matching algorithm, based on Jaro distance. 
Level2JaroWinkler  "Level 2" recursive field matching algorithm, based on Jaro distance. 
Level2Levenstein  "Level 2" recursive field matching algorithm using Levenstein distance. 
Level2MongeElkan  Monge & Elkan's "level 2" recursive field matching algorithm. 
Levenstein  Levenstein string distance. 
MemoMatrix  A matrix of doubles, defined recursively by the compute(i,j) method, that will not be recomputed more than necessary. 
Mixture  Mixturebased distance metric. 
MongeElkan  The match method proposed by Monge and Elkan. 
MultiStringAvgDistance  StringDistance defined over Strings that are broken into fields, with distance defined as the average distance between any field. 
MultiStringDistance  Abstract class StringDistance defined over Strings that are broken into fields. 
MultiStringWrapper  A StringWrapper that stores a version of the string that has been either (a) split into a number of distinct fields, or (b) duplicated k times, so that k different StringDistance's can preprocess it, of (b) both of the above. 
NeedlemanWunsch  NeedlemanWunsch string distance, following Durban et al. 
PrintfFormat  PrintfFormat allows the formatting of an array of objects embedded within a string. 
ScaledLevenstein  Levenstein string distance. 
SmithWaterman  SmithWaterman string distance, following Durban et al. 
SoftTFIDF  TFIDFbased distance metric, extended to use "soft" tokenmatching. 
SoftTokenFelligiSunter  Highly simplified model of FelligiSunter's method 1, applied to tokens. 
TagLink  
TagLink.Candidates  
TFIDF  TFIDFbased distance metric. 
TokenFelligiSunter  Highly simplified model of FelligiSunter's method 1, applied to tokens. 
UnsmoothedJS  JensenShannon distance of two unsmoothed unigram language models. 
WinklerRescorer  Winkler's reweighting scheme for distance metrics. 
WizardUI  Toplevel GUI interface. 
This package contains a bunch of approximate string comparators, plus code for performing controlled experiments with this.
A StringDistance
is the basic class
for computing distances. The score() function of this class outputs a
distance measure between its two arguments. The other methods are
there for efficiency, so that preprocessing steps (like tokenization)
can be amortized over multiple comparisons with the same string.
The way that preprocessing steps are saved is by creating a StringWrapper
object which contains the
preprocessed string, plus whatever else needs to be cached. To do
this, extend default implementation of StringWrapper.
Almost everything in this package implements StringDistance. The only
(public) exceptions are StringWrapper; PrintfFormat, pilfered from Sun
to make the explanations easier; CharMatchScore
, which is a characterbased
distance metric; and MemoMatrix
, a
utility for defining editdistancebased methods.


PREV PACKAGE NEXT PACKAGE  FRAMES NO FRAMES 