Date: Mon, 10 Apr 2000 09:54:01 -0400
Reply-To: Steve Giblin <Steve_Giblin@VAPOWER.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Steve Giblin <Steve_Giblin@VAPOWER.COM>
Subject: Re: Match merge problem
Content-type: text/plain; charset=us-ascii
Another approch you may wish to employ is SQL.
This will allow you to grab multiples of partial matches,
then you can devise a hierarchy for which matches are "best".
Note: You will have to be clear on your variable names
You might also want to have a unique sequence number
in each of the lists (ex SEQA, SEQB) to help in the dedupe
process.
Something like
PROC SQL;
CREATE TABLE MATCH AS
SELECT A.*, B.*
FROM LISTA A, LISTB B
WHERE A.PHONEA = B.PHONEB
OR A.LNAMEA = B.LNAMEB
OR A.ADDR1A = B.ADDR1B
OR ...
; QUIT;
* If you have the name components parsed in one file but not the other ;
* you can employ something like
OR INDEX(FULLNAMA,TRIM(LNAMEB))>0
* Same thing for address components
OR INDEX(ADDR1A,TRIM(HOUSENMB))=1
OR INDEX(ADDR1A,TRIM(STREETB))>0
This SQL method will allow you to grab all of the possible matches and then
you will
have to read through multiples and settle upon what is "best".
You will need to makes sure that you do not apply too loose of an OR
condition,
otherwise you will get too many. For example, if you simply allow for a
lookup of
"Smith", on a last name match, you will get many unwanted matches. But if
you
do "Smith" AND parts of the address, you can manage your result set.
Hope this adds another option for you.
Steve