com.wcohen.ss.api
Interface StringDistanceLearner

All Known Implementing Classes:
AbstractStatisticalTokenDistance, AbstractStringDistance, AbstractTokenizedStringDistance, AdaptiveStringDistanceLearner, AffineGap, ApproxNeedlemanWunsch, AveragedStringDistanceLearner, CombinedStringDistanceLearner, DirichletJS, Jaccard, Jaro, JaroWinkler, JaroWinklerTFIDF, JelinekMercerJS, JensenShannonDistance, Level2, Level2Jaro, Level2JaroWinkler, Level2Levenstein, Level2MongeElkan, Levenstein, Mixture, MongeElkan, NeedlemanWunsch, ScaledLevenstein, SmithWaterman, SoftTFIDF, SoftTokenFelligiSunter, TagLink, TagLinkToken, TFIDF, TokenFelligiSunter, UnsmoothedJS, WinklerRescorer

public interface StringDistanceLearner

Learn a StringDistance.


Method Summary
 void addExample(DistanceInstance answeredQuery)
          Accept the answer to the last query.
 StringDistance getDistance()
          Return the learned distance.
 boolean hasNextQuery()
          Returns true if the learner has more queries to answer.
 DistanceInstance nextQuery()
          Returns a DistanceInstance for which the learner would like a label.
 DistanceInstanceIterator prepare(DistanceInstanceIterator i)
          Preprocess a DistanceInstanceIterator for supervised training.
 StringWrapperIterator prepare(StringWrapperIterator i)
          Preprocess a StringWrapperIterator for unsupervised training.
 void setDistanceInstancePool(DistanceInstanceIterator i)
          Accept a set of unlabeled DistanceInstance, to use in making distance instance queries.
 void setStringWrapperPool(StringWrapperIterator i)
          Unsupervised learning method that observes strings for which distance will be computed.
 

Method Detail

prepare

StringWrapperIterator prepare(StringWrapperIterator i)
Preprocess a StringWrapperIterator for unsupervised training.


prepare

DistanceInstanceIterator prepare(DistanceInstanceIterator i)
Preprocess a DistanceInstanceIterator for supervised training.


setStringWrapperPool

void setStringWrapperPool(StringWrapperIterator i)
Unsupervised learning method that observes strings for which distance will be computed. This examines a number of unlabeled StringWrapper's and uses that information to tune the distance function being learned. An example use of this method would be a TFIDF-based distance function, which accumulated token-frequency statistics over a corpus.


setDistanceInstancePool

void setDistanceInstancePool(DistanceInstanceIterator i)
Accept a set of unlabeled DistanceInstance, to use in making distance instance queries. Queries are made with the methods hasNextQuery(), nextQuery(), and setAnswer().


hasNextQuery

boolean hasNextQuery()
Returns true if the learner has more queries to answer.


nextQuery

DistanceInstance nextQuery()
Returns a DistanceInstance for which the learner would like a label.


addExample

void addExample(DistanceInstance answeredQuery)
Accept the answer to the last query. An 'answer' is a DistanceInstance with a known score or correctness.


getDistance

StringDistance getDistance()
Return the learned distance.