Date: Tue, 30 Sep 2003 18:41:59 GMT
Reply-To: Paul Mundell <pmundell@ATTGLOBAL.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Mundell <pmundell@ATTGLOBAL.NET>
Organization: SBC http://yahoo.sbc.com
Subject: Re: Finding duplicate records
Thanks to all of you who responded to my query. One of the most enjoyable
things about getting answers from the list is the chance to try out several
different ways to to the same thing. I appreciate your help, especially
knowing you probably have to weather those idiotic 'microsoft service patch'
emails every time you post. Thanks again, Paul
"Paul Mundell" <firstname.lastname@example.org> wrote in message
> I have a data set of about 20,000 records (i.e., rows.) There should be
> record per individual and individuals are identified by a 7 digit number.
> There are also about 30 other variables associated with each record. Can
> someone tell me how I might find any duplicates in the data and print them
> out? I realized that I could use proc sort and nodupkey, but that would
> simply delete duplicate id numbers. I need to see the duplicates since
> will be instances of other animals that have been given identical id
> due to data entry errors. A partial example would look like this:
> id name sire dam
> ug/g ....
> 1999607 jim 1998001 1998234 12.3 0.0786
> 1999608 jack 1998001 1998234 10.3 0.100
> 1999608 jill 1998001 1998234 11.7
> 1999610 klyde 1998612 1998137 12.8 0.
> In this example, I would like to get the two instances of 1999608 selected
> out, so that I can change one of them to 1999609. Thanks, Paul