|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.wcohen.ss.lookup.SoftTFIDFDictionary
public class SoftTFIDFDictionary
Looks up nearly-matching strings in a dictionary, using SoftTFIDF distance. To use the dictionary, first load in string/value pairs using 'put'. Then 'freeze' the dictionary. After the dictionary is frozen, you can lookup values with lookup and getResult(i), getValue(i), etc.
For example:
SoftTFIDFDictionary dict = new SoftTFIDFDictionary();
dict.put("william cohen", "wcohen@cs.cmu.edu");
dict.put("vitor del rocha carvalho", "vitor@cs.cmu.edu");
...
dict.freeze();
int n=dict.lookup("victor carvalho");
for (int i=0; i
Field Summary | |
---|---|
protected double |
lookupTime
|
Constructor Summary | |
---|---|
SoftTFIDFDictionary()
|
|
SoftTFIDFDictionary(Tokenizer tokenizer)
|
|
SoftTFIDFDictionary(Tokenizer tokenizer,
double minTokenSimilarity)
|
|
SoftTFIDFDictionary(Tokenizer tokenizer,
double minTokenSimilarity,
int windowSize,
int maxInvertedIndexSize)
Create a new SoftTFIDFDictionary. |
Method Summary | |
---|---|
void |
freeze()
Make it impossible to add new values, but possible to perform lookups. |
double |
getLookupTime()
Get the time used in performing the lookup |
int |
getMaxInvertedIndexSize()
|
java.lang.String |
getResult(int i)
Get the i'th string found by the last lookup |
double |
getScore(int i)
Get the score of the i'th string found by the last lookup |
java.lang.Object |
getValue(int i)
Get the value of the i'th string found by the last lookup |
int |
getWindowSize(int w)
|
void |
loadAliases(java.io.File file)
Load a file of identifiers, each of which has multiple aliases. |
int |
lookup(double minScore,
java.lang.String toFind)
Lookup items SoftTFIDF-similar to the 'toFind' argument, and return the number of items found. |
static void |
main(java.lang.String[] argv)
Simple main for testing and experimentation |
void |
put(java.lang.String string,
java.lang.Object value)
Insert a string into the dictionary, and associate it with the given value. |
void |
refreeze()
|
static SoftTFIDFDictionary |
restore(java.io.File file)
|
void |
saveAs(java.io.File file)
|
void |
setMaxInvertedIndexSize(int m)
Set the maximum size of an inverted index that will be followed. |
void |
setWindowSize(int w)
Set the 'windowSize' used for finding similar tokens. |
int |
slowLookup(double minScore,
java.lang.String toFind)
Exactly like lookup, but works by exhaustively checking every stored string. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected double lookupTime
Constructor Detail |
---|
public SoftTFIDFDictionary()
public SoftTFIDFDictionary(Tokenizer tokenizer)
public SoftTFIDFDictionary(Tokenizer tokenizer, double minTokenSimilarity)
public SoftTFIDFDictionary(Tokenizer tokenizer, double minTokenSimilarity, int windowSize, int maxInvertedIndexSize)
Method Detail |
---|
public void saveAs(java.io.File file) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
public static SoftTFIDFDictionary restore(java.io.File file) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
public void setWindowSize(int w)
public int getWindowSize(int w)
public void setMaxInvertedIndexSize(int m)
public int getMaxInvertedIndexSize()
public void loadAliases(java.io.File file) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
public void put(java.lang.String string, java.lang.Object value)
public void refreeze()
public void freeze()
public int slowLookup(double minScore, java.lang.String toFind)
public int lookup(double minScore, java.lang.String toFind)
lookup
in interface FastLookup
public java.lang.String getResult(int i)
getResult
in interface FastLookup
public java.lang.Object getValue(int i)
getValue
in interface FastLookup
public double getScore(int i)
getScore
in interface FastLookup
public double getLookupTime()
public static void main(java.lang.String[] argv) throws java.io.IOException, java.io.FileNotFoundException, java.lang.NumberFormatException, java.lang.ClassNotFoundException
java.io.IOException
java.io.FileNotFoundException
java.lang.NumberFormatException
java.lang.ClassNotFoundException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |