com.wcohen.ss
Class TagLink

java.lang.Object
  extended by com.wcohen.ss.AbstractStringDistance
      extended by com.wcohen.ss.AbstractTokenizedStringDistance
          extended by com.wcohen.ss.AbstractStatisticalTokenDistance
              extended by com.wcohen.ss.TagLink
All Implemented Interfaces:
StringDistance, StringDistanceLearner

public class TagLink
extends AbstractStatisticalTokenDistance


Nested Class Summary
static class TagLink.Candidates
           
protected  class TagLink.UnitVector
          Marker class extending BagOfTokens
 
Field Summary
 
Fields inherited from class com.wcohen.ss.AbstractStatisticalTokenDistance
collectionSize, documentFrequency, totalTokenCount
 
Fields inherited from class com.wcohen.ss.AbstractTokenizedStringDistance
tokenizer
 
Constructor Summary
TagLink()
          TagLink default constructor.
TagLink(AbstractStringDistance tokenDistance)
          TagLink constructor requires a character based string metric.
TagLink(java.lang.String[] dataSetArray)
          TagLink constructor requires dataset string array in order to compute the IDF weights.
TagLink(java.lang.String[] dataSetArray, AbstractStringDistance tokenDistance)
          TagLink constructor requires dataset string array in order to compute the IDF weights and a tokenDistance metric.
TagLink(Tokenizer tokenizer, AbstractStringDistance tokenDistance)
          TagLink constructor requires a tokenizer and a tokenDistance metric
 
Method Summary
protected  TagLink.UnitVector asUnitVector(StringWrapper w)
           
 java.lang.String explainScore(StringWrapper s, StringWrapper t)
          explainStringMetric gives a brief explanation of how the stringMetric was computed.
 StringWrapper prepare(java.lang.String s)
          Preprocess a string by finding tokens and giving them TFIDF weights
 double score(StringWrapper s, StringWrapper t)
          getStringMetric computes the similarity between a pair of strings T and U.
 java.lang.String toString()
          toString returns the name and parameters of this string metric
 
Methods inherited from class com.wcohen.ss.AbstractStatisticalTokenDistance
checkTrainingHasHappened, getDocumentFrequency, train
 
Methods inherited from class com.wcohen.ss.AbstractTokenizedStringDistance
asBagOfTokens, prepare, setStringWrapperPool
 
Methods inherited from class com.wcohen.ss.AbstractStringDistance
addExample, doMain, explainScore, getDistance, hasNextQuery, nextQuery, prepare, score, setDistanceInstancePool
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TagLink

public TagLink()
TagLink default constructor. IDF weights are all equally weighted. Transposition constant value is 0.3


TagLink

public TagLink(AbstractStringDistance tokenDistance)
TagLink constructor requires a character based string metric.

Parameters:
characterBasedStringMetric - CharacterBasedStringMetric

TagLink

public TagLink(Tokenizer tokenizer,
               AbstractStringDistance tokenDistance)
TagLink constructor requires a tokenizer and a tokenDistance metric

Parameters:
trainDataObjectArray - TrainDataObject[]

TagLink

public TagLink(java.lang.String[] dataSetArray)
TagLink constructor requires dataset string array in order to compute the IDF weights. Default character based string metric is TagLinkToken.

Parameters:
dataSetArray - String[]

TagLink

public TagLink(java.lang.String[] dataSetArray,
               AbstractStringDistance tokenDistance)
TagLink constructor requires dataset string array in order to compute the IDF weights and a tokenDistance metric.

Parameters:
dataSetArray - String[]
Method Detail

score

public double score(StringWrapper s,
                    StringWrapper t)
getStringMetric computes the similarity between a pair of strings T and U.

Specified by:
score in interface StringDistance
Specified by:
score in class AbstractStringDistance
Parameters:
T - String
U - String
Returns:
double

asUnitVector

protected TagLink.UnitVector asUnitVector(StringWrapper w)

prepare

public StringWrapper prepare(java.lang.String s)
Preprocess a string by finding tokens and giving them TFIDF weights

Specified by:
prepare in interface StringDistance
Overrides:
prepare in class AbstractStringDistance

explainScore

public java.lang.String explainScore(StringWrapper s,
                                     StringWrapper t)
explainStringMetric gives a brief explanation of how the stringMetric was computed.

Specified by:
explainScore in interface StringDistance
Specified by:
explainScore in class AbstractStringDistance
Parameters:
S - String
T - String
Returns:
String

toString

public java.lang.String toString()
toString returns the name and parameters of this string metric

Overrides:
toString in class java.lang.Object
Returns:
String