LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 1998, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 1 May 1998 09:52:10 -0400
Reply-To:     "Zuckier, Gerald" <Zuckier@CHIME.ORG>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         "Zuckier, Gerald" <Zuckier@CHIME.ORG>
Subject:      Re: ADDRESS merge purge
Comments: To: Chuck Rodabaugh <crodabaugh@LOYALTYLOOP.COM>
Content-Type: text/plain

As it happens, in the other window here I am right now cranking through Automatch. Works pretty well, sort of suits a SAS mentality, as it's kind of tweakish, rather than something a dodo could just turn the crank on. (Although one could probably get better than zero results). It's a bit of a throwback, a DOS program, but all that really affects is the interface; the memory handling capabilities are sophisticated and it can handle some big files. As a bonus, being a DOS program, it's absolutely bulletproof, as long as you get the syntax and order of things correct. I've got it running in a DOS window using 32 Meg of my ram right now, and that speeds things up a lot. But it's all run off the command line with some arcane switches; I just figured out a good batch file that runs everything, and edit it to fit each new project. So, as they say, dance: 9, looks: 2. Matches on character, numeric, date, and other variables; does various kinds of 'fuzzy' matches on dirty data, i.e. date +some number of days, - a different number, numeric same deal, character with a specified number of different characters, and an 'uncertain' match which is not really well documented but works well for the really junky data we have. (hospital claim insurance policy IDs. How do these people ever get paid?). It's probabilistic; it calculates the odds that two records match based on each variable's distribution and what you tell it (i.e. 2 records with names Kracznitzj and Kracznitzj get a higher score than 2 records with names Smith and Smith) and totals up the scores for all the variables you tell it to, and then you can specify a threshold/cutoff for match/no match. We calculate it mathematically; it's all based on log to the base 2 of the odds ratio, so once again I wrote a trivial spreadsheet that works based on the number of records in the file and our matching goal (90% correct matching) and just update it for each project. I don't know what it costs. email info@matchware.com. On another tack, Charles Patridge put together a pretty impressive SAS mail purge application. It groups all possible matched names and addresses based on Chuck's extensive and deep thoughts about what kinds of errors you see (chuck=Charles=C., etc.) then you go over those results by hand and mark which ones to purge. Good if you're not doing a million records. It's free, at http://pages.prodigy.com/SASCONSIG/tip00000.htm (that's CASE SeNSItiVe).

> ---------- > From: Chuck Rodabaugh[SMTP:crodabaugh@LOYALTYLOOP.COM] > Sent: Thursday, April 30, 1998 4:08 PM > To: SAS-L@UGA.CC.UGA.EDU > Subject: ADDRESS merge purge > > I am looking for some software to integrate with SAS that will allow > me to > perform relatively simple merge/purge on names and addresses. I could > also > use Oracle-but need address merge/purge engine. > > Any ideas-I am willing to pay $$! >


Back to: Top of message | Previous page | Main SAS-L page