Date: Fri, 1 May 1998 09:52:10 -0400
Reply-To: "Zuckier, Gerald" <Zuckier@CHIME.ORG>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Zuckier, Gerald" <Zuckier@CHIME.ORG>
Subject: Re: ADDRESS merge purge
Content-Type: text/plain
As it happens, in the other window here I am right now cranking through
Automatch. Works pretty well, sort of suits a SAS mentality, as it's
kind of tweakish, rather than something a dodo could just turn the crank
on. (Although one could probably get better than zero results).
It's a bit of a throwback, a DOS program, but all that really affects is
the interface; the memory handling capabilities are sophisticated and it
can handle some big files. As a bonus, being a DOS program, it's
absolutely bulletproof, as long as you get the syntax and order of
things correct. I've got it running in a DOS window using 32 Meg of my
ram right now, and that speeds things up a lot. But it's all run off the
command line with some arcane switches; I just figured out a good batch
file that runs everything, and edit it to fit each new project. So, as
they say, dance: 9, looks: 2. Matches on character, numeric, date, and
other variables; does various kinds of 'fuzzy' matches on dirty data,
i.e. date +some number of days, - a different number, numeric same deal,
character with a specified number of different characters, and an
'uncertain' match which is not really well documented but works well for
the really junky data we have. (hospital claim insurance policy IDs. How
do these people ever get paid?).
It's probabilistic; it calculates the odds that two records match based
on each variable's distribution and what you tell it (i.e. 2 records
with names Kracznitzj and Kracznitzj get a higher score than 2 records
with names Smith and Smith) and totals up the scores for all the
variables you tell it to, and then you can specify a threshold/cutoff
for match/no match. We calculate it mathematically; it's all based on
log to the base 2 of the odds ratio, so once again I wrote a trivial
spreadsheet that works based on the number of records in the file and
our matching goal (90% correct matching) and just update it for each
project.
I don't know what it costs. email info@matchware.com.
On another tack, Charles Patridge put together a pretty impressive SAS
mail purge application. It groups all possible matched names and
addresses based on Chuck's extensive and deep thoughts about what kinds
of errors you see (chuck=Charles=C., etc.) then you go over those
results by hand and mark which ones to purge. Good if you're not doing a
million records. It's free, at
http://pages.prodigy.com/SASCONSIG/tip00000.htm
(that's CASE SeNSItiVe).
> ----------
> From: Chuck Rodabaugh[SMTP:crodabaugh@LOYALTYLOOP.COM]
> Sent: Thursday, April 30, 1998 4:08 PM
> To: SAS-L@UGA.CC.UGA.EDU
> Subject: ADDRESS merge purge
>
> I am looking for some software to integrate with SAS that will allow
> me to
> perform relatively simple merge/purge on names and addresses. I could
> also
> use Oracle-but need address merge/purge engine.
>
> Any ideas-I am willing to pay $$!
>
|