LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 19 Jun 2006 12:34:02 -0400
Reply-To:   "Rickards, Clinton (GE Consumer Finance)" <clinton.rickards@GE.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Rickards, Clinton (GE Consumer Finance)" <clinton.rickards@GE.COM>
Subject:   Re: de-duping without a unique identifier
Comments:   To: Jonathan Woodring <jwoodring@BDTRUST.ORG>
In-Reply-To:   A<D0EC0BFE19A0BF4D8BED3A902D5A6B510BC020@bdtex.bdtrust.local>
Content-Type:   text/plain; charset="iso-8859-1"

Jonathan,

Taking you literally, I think something like the following will do the trick:

proc sort data=master; by last first; run;

proc sort data=monthly (keep=last first) out=monthly_nodups nodupkey; by last first; run;

data new_master; merge master (in=a) monthly_nodups (in=b); by last first; /* choose one of these if conditions: */ **if not (a and b); /* adds new monthly records to master */ **if a and not b; /* keep only master and no monthly */ run;

but I suspect you really want to be more selective and also handle mispellings, addresses, etc. If so, the questions become: how close is close enough to say that two records are identical? And what is your tolerance for error (false matches and false mismatches)?

HTH,

Clint

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Jonathan Woodring Sent: Monday, June 19, 2006 11:48 AM To: SAS-L@LISTSERV.UGA.EDU Subject: de-duping without a unique identifier

Hi SAS-L,

I want to remove records (names) in a master file if they are contained in a monthly update file. Here's the rub: we do not have a unique identifier to easily do this in both files. Instead, we have first name, middle initial, last name, address1, address2, city, state, zip, zip4. I want to 'de-dupe' the master list of names, if the first name and last name are direct matches. Any words of wisdom from the experts before this amateur starts playing around? Thanks!

Jonathan


Back to: Top of message | Previous page | Main SAS-L page