|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
AbstractStatisticalTokenDistance | Abstract token distance metric that uses frequency statistics. |
AbstractStringDistance | Abstract class which implements StringDistanceLearner as well as StringDistance. |
AbstractTokenizedStringDistance | Abstract distance metric for tokenized strings. |
AdaptiveStringDistanceLearner | Abstract StringDistanceLearner class which averages results of a number of inner distance metrics, learned by a number of inner distance learners. |
AffineGap | Affine-gap string distance, following Durban et al. |
ApproxMemoMatrix | Variant of MemoMatrix that only stores values near the diagonal, for better efficiency. |
ApproxNeedlemanWunsch | Needleman-Wunsch string distance, following Durban et al. |
AveragedStringDistanceLearner | Abstract StringDistanceLearner class which averages results of a number of inner distance metrics, learned by a number of inner distance learners. |
BasicDistanceInstanceIterator | A simple DistanceInstanceIterator implementation. |
BasicStringWrapper | An extendible (non-final) class that implements some of the functionality of a string. |
BasicStringWrapperIterator | A simple StringWrapperIterator implementation. |
CharMatchScore | Abstract distance between characters. |
CombinedStringDistanceLearner | Abstract StringDistanceLearner class which combines results of a number of inner distance metrics, learned by a number of inner distance learners. |
DirichletJS | Jensen-Shannon distance of two unigram language models, smoothed using Dirichlet prior. |
DistanceLearnerFactory | Creates distance metric learners from string descriptions. |
Jaccard | Jaccard distance implementation. |
Jaro | Jaro distance metric. |
JaroWinkler | Jaro distance metric, as extended by Winkler. |
JaroWinklerTFIDF | Soft TFIDF-based distance metric, extended to use "soft" token-matching with the JaroWinkler distance metric. |
JelinekMercerJS | Jensen-Shannon distance of two unigram language models, smoothed using Jelinek-Mercer mixture model. |
JensenShannonDistance | Distance metrics based on Jensen-Shannon distance of two smoothed unigram language models. |
Level2 | Generic version of Monge & Elkan's "level 2" recursive field matching. |
Level2Jaro | "Level 2" recursive field matching algorithm, based on Jaro distance. |
Level2JaroWinkler | "Level 2" recursive field matching algorithm, based on Jaro distance. |
Level2Levenstein | "Level 2" recursive field matching algorithm using Levenstein distance. |
Level2MongeElkan | Monge & Elkan's "level 2" recursive field matching algorithm. |
Levenstein | Levenstein string distance. |
MemoMatrix | A matrix of doubles, defined recursively by the compute(i,j) method, that will not be recomputed more than necessary. |
Mixture | Mixture-based distance metric. |
MongeElkan | The match method proposed by Monge and Elkan. |
MultiStringAvgDistance | StringDistance defined over Strings that are broken into fields, with distance defined as the average distance between any field. |
MultiStringDistance | Abstract class StringDistance defined over Strings that are broken into fields. |
MultiStringWrapper | A StringWrapper that stores a version of the string that has been either (a) split into a number of distinct fields, or (b) duplicated k times, so that k different StringDistance's can preprocess it, of (b) both of the above. |
NeedlemanWunsch | Needleman-Wunsch string distance, following Durban et al. |
PrintfFormat | PrintfFormat allows the formatting of an array of objects embedded within a string. |
ScaledLevenstein | Levenstein string distance. |
SmithWaterman | Smith-Waterman string distance, following Durban et al. |
SoftTFIDF | TFIDF-based distance metric, extended to use "soft" token-matching. |
SoftTokenFelligiSunter | Highly simplified model of Felligi-Sunter's method 1, applied to tokens. |
TagLink | |
TagLink.Candidates | |
TFIDF | TFIDF-based distance metric. |
TokenFelligiSunter | Highly simplified model of Felligi-Sunter's method 1, applied to tokens. |
UnsmoothedJS | Jensen-Shannon distance of two unsmoothed unigram language models. |
WinklerRescorer | Winkler's reweighting scheme for distance metrics. |
WizardUI | Top-level GUI interface. |
This package contains a bunch of approximate string comparators, plus code for performing controlled experiments with this.
A StringDistance
is the basic class
for computing distances. The score() function of this class outputs a
distance measure between its two arguments. The other methods are
there for efficiency, so that preprocessing steps (like tokenization)
can be amortized over multiple comparisons with the same string.
The way that preprocessing steps are saved is by creating a StringWrapper
object which contains the
preprocessed string, plus whatever else needs to be cached. To do
this, extend default implementation of StringWrapper.
Almost everything in this package implements StringDistance. The only
(public) exceptions are StringWrapper; PrintfFormat, pilfered from Sun
to make the explanations easier; CharMatchScore
, which is a character-based
distance metric; and MemoMatrix
, a
utility for defining edit-distance-based methods.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |