See the video on how it works
Fuzzy Matching - How VISION turned iterative coding process
into simple field entry.
Data Entry
From keyboard, to mouse, to implants, and back to keyboard.
Point and Click
Since 1984, personal computers have allowed us to point and click.
Filling out a form on your computer is easier than on paper, because
for many fields, such as Gender or Month, there are only a few
possible values to choose from.
Good interface design allows users to choose a valid item without
typing. For instance, you'll typically see radio buttons to choose
between two items, such as Male or Female. And unless there were a lot
of room on the form, a drop-down menu would be best for choosing one
of twelve months.
But a menu becomes unwieldy if there are more than 47 items. [Note:
some experts argue that 46 is already too many.] The next step up
would be a scrolling list.
Finally, to select one of the thousands of files on your hard drive,
your file system lets you drill down through a series of folders or
directories.
The huge advantage of these no-typing interfaces is that they *show*
all the choices, or at least all the choices at each step. Instead of
trying to imagine what dessert you'd like, the waitress is holding a
dessert tray right under your nose.
Cyborg
Can we do better? Of course!
Here at Prelude Dynamics, all the employees have undergone cybernetic
surgery. This obviates the need for a traditional user interface and
allows us to see funny youtube videos without a display.
But many of our customers, put off by neural disruption and other
encephalo-munication side-effects, have asked for a
backwardly-compatible solution instead. |
 |
String Matching
So let's return to the idea of using a keyboard to enter text. Imagine
there are several thousand items to choose from, and the user *knows*
which item she wants. An "autocomplete" function can minimize the
required typing by guessing where the user is heading as she types.
Once there are enough characters, the computer displays her choice in
a short list of matching items.
If a user types "abc", then a typical autocomplete goes to an
alphabetic list of dictionary items, and looks up the sub-list of
items which start with "abc". It's quick and easy. But a misspelling
is fatal, and if hundreds of items happen to begin with the same 23
characters, she'll have to type at least 23 characters before the
autocomplete helps. Hmm.
Fuzzy Matching
Approximate string matching--often called "fuzzy" matching--eliminates
the dichotomy of "match" or "no match". Instead, *every* item is
compared to the user's typing and each item is scored with its
"distance" from being an exact match. (A typical edit distance is
something like "the minimum number of characters needing to be
inserted, deleted, or swapped to make an exact match".) Fuzzy matching
forgives typing errors, and looks for matches anywhere in the item,
not just at
the start. In fact, by intentionally typing the most distinctive part
of the phrase you want, instead of the first part, you can usually get
your match with fewer characters.
If fuzzy matching is obviously better, why isn't it used everywhere?
Because it is very hard to calculate a complex edit-distance on every
item, and still get back to the user before the next keystroke!
Behind the Curtain
By 1907, many other magicians were imitating Harry Houdini's famous
handcuff escapes. He needed something new, and started experimenting
with escaping from a strait jacket. With practice, and his remarkable
strength and agility, he was able to wiggle and squirm and gradually
inch the jacket up until finally...
the jacket would slip back down! It was enraging.
There seemed to be no solution, so Harry Houdini changed the problem.
He boasted that his amazing strait-jacket escape would be performed
while hanging upside down! Now the jacket wouldn't slip back, and the
audience was even more astounded.
Changing the Problem
We can't calculate the exact "edit distance" fast enough. There seems
to be no solution, so we changed the problem.
Since we aren't doing exact matching, why do we need an *exact* edit
distance? Instead, we've borrowed an idea from the scientists working
on image recognition. How do they search a million mug shots to see if
your grandmother's face appears there? In advance, they approximate
the features of each mug shot with a digital signature. Then, when the
police snap a shot of your grandmother, the software computes *her*
digital signature, too. It's the small, approximate, digital
signatures which are compared at lightning speed.
For autocompletion, a 64-bit digital signature can answer 64 yes-no
questions about the original text item. Does the word contain at least
3 vowels? Does it have a moustache? No wait, that's for your
grandmother's mug shot.
When the user presses a key, that latest search word it rushed back to
the VISION server, where its signature is compared to the signature of
*every* item in the dictionary. Fortunately, bit comparisons are very
fast. (And if the signatures are small enough, they can all fit
entirely with the CPU cache, making the search even faster.) If the
signatures are close, we assume the original words are close, too.
Amazingly, it all seems to work. Give it a try!
VISION screenshots - Fuzzy in Action - Click to expand

AE Coding:
Notice the misspelled entry and how it still finds the desired value.
|

AE Coding:
We see our value along with its coding. |

Con Med Coding:
Con Meds with fuzzy matching at work. |
See the video on how it works |
|