Date: Wed, 31 Jul 2002 10:25:35 -0400
Reply-To: "James, Steve" <spj1@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "James, Steve" <spj1@CDC.GOV>
Subject: Searching Character Field for Nonexact Matches?
Content-Type: text/plain; charset="iso-8859-1"
Dear SAS-L,
A colleague has a web application that allows the user to query a text field
of about 2000 characters. Right now only an exact match search is allowed
so that if the user types "vaccine" only those entries that have the exact
term are returned (I believe it's done using the INDEX function). What
would be nice is if an inexact match were made such that if a person typed
in the misspelled term "vacine" then the search would still return the same
records as before.
I know that there's an operator called "Sounds Like" but I don't think that
will work for this application since I'm trying to match just a part of the
entire character field.
I've thought that Text Miner might be the ultimate solution in that it might
allow even more complex decisions about matches to be made. Whether that's
true or not and how to hook that up to a web application are other questions
I have.
I wondered if anyone had any suggestions that they might share.
Steve James
IT Specialist
National Immunization Program
Statistical Analysis Branch
Centers for Disease Control and Prevention
(404) 639-6041 (phone)
(404) 639-1728 (fax)
sjames@cdc.gov