|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.wcohen.ss.lookup.SoftDictionary
public class SoftDictionary
Looks up nearly-matching strings in a dictionary, using a string distance.
A typical use:
SoftDictionary softDict = new SoftDictionary(new SimpleTokenizer(true,true));
String alias[] = new String[]{"william cohen", "wwcohen", "einat minkov", "eminkov", .... };
for (int i=0; i
Constructor Summary | |
---|---|
SoftDictionary()
|
|
SoftDictionary(StringDistanceLearner distanceLearner)
|
|
SoftDictionary(StringDistanceLearner distanceLearner,
Tokenizer tokenizer)
|
|
SoftDictionary(Tokenizer tokenizer)
|
Method Summary | |
---|---|
StringDistanceTeacher |
getTeacher()
Return a teacher that can 'train' a distance metric from the information in the dictionary. |
void |
load(java.io.File file)
Insert all lines in a file as items mapping to themselves. |
void |
load(java.io.File file,
boolean ids)
Insert all lines in a file as items mapping to themselves. |
void |
loadAliases(java.io.File file)
Load a file of identifiers, each of which has multiple aliases. |
java.lang.Object |
lookup(java.lang.String toFind)
Lookup a string in the dictionary. |
java.lang.Object |
lookup(java.lang.String id,
java.lang.String toFind)
Lookup a string in the dictionary. |
java.lang.Object |
lookup(java.lang.String id,
StringWrapper toFind)
Lookup a prepared string in the dictionary. |
java.lang.Object |
lookup(StringWrapper toFind)
Lookup a prepared string in the dictionary. |
double |
lookupDistance(java.lang.String toFind)
Return the distance to the best match. |
double |
lookupDistance(java.lang.String id,
java.lang.String toFind)
Return the distance to the best match. |
double |
lookupDistance(java.lang.String id,
StringWrapper toFind)
Return the distance to the best match. |
double |
lookupDistance(StringWrapper toFind)
Return the distance to the best match. |
static void |
main(java.lang.String[] argv)
Simple main for testing. |
StringWrapper |
prepare(java.lang.String s)
Prepare a string for quicker lookup. |
void |
put(java.lang.String string,
java.lang.Object value)
Insert a string into the dictionary. |
void |
put(java.lang.String id,
java.lang.String string,
java.lang.Object value)
Insert a string into the dictionary. |
void |
put(java.lang.String id,
StringWrapper toInsert,
java.lang.Object value)
Insert a prepared string into the dictionary. |
int |
size()
Return the number of entries in the dictionary. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SoftDictionary()
public SoftDictionary(StringDistanceLearner distanceLearner)
public SoftDictionary(Tokenizer tokenizer)
public SoftDictionary(StringDistanceLearner distanceLearner, Tokenizer tokenizer)
Method Detail |
---|
public int size()
public StringWrapper prepare(java.lang.String s)
public void load(java.io.File file) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
public void load(java.io.File file, boolean ids) throws java.io.IOException, java.io.FileNotFoundException
This is mostly for testing the id feature.
java.io.IOException
java.io.FileNotFoundException
public void loadAliases(java.io.File file) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
public void put(java.lang.String id, java.lang.String string, java.lang.Object value)
Id is a special tag used to handle 'leave one out' lookups. If you do a lookup on a string with a non-null id, you get the closest matches that do not have the same id.
public void put(java.lang.String string, java.lang.Object value)
public void put(java.lang.String id, StringWrapper toInsert, java.lang.Object value)
Id is a special tag used to handle 'leave one out' lookups. If you do a lookup on a string with a non-null id, you get the closest matches that do not have the same id.
public java.lang.Object lookup(java.lang.String id, java.lang.String toFind)
If id is non-null, then consider only strings with different ids (or null ids).
public java.lang.Object lookup(java.lang.String id, StringWrapper toFind)
If id is non-null, then consider only strings with different ids (or null ids).
public double lookupDistance(java.lang.String id, java.lang.String toFind)
If id is non-null, then consider only strings with different ids (or null ids).
public double lookupDistance(java.lang.String id, StringWrapper toFind)
If id is non-null, then consider only strings with different ids (or null ids).
public java.lang.Object lookup(java.lang.String toFind)
public java.lang.Object lookup(StringWrapper toFind)
public double lookupDistance(java.lang.String toFind)
public double lookupDistance(StringWrapper toFind)
public StringDistanceTeacher getTeacher()
public static void main(java.lang.String[] argv) throws java.io.IOException, java.io.FileNotFoundException
java.io.IOException
java.io.FileNotFoundException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |