Date: Thu, 18 May 2000 09:15:43 -0700
Reply-To: "Terjeson, Mark" <TERJEMW@DSHS.WA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Terjeson, Mark" <TERJEMW@DSHS.WA.GOV>
Subject: Re: Duplicate record count...
Content-Type: text/plain
Hi Sally,
Here is one method of doing so:
data test;
input clientid sex educ obsnum;
datalines;
21 1 12 1
21 1 12 2
21 1 12 3
21 1 12 4
21 1 12 5
21 2 12 6
22 1 12 7
22 1 12 8
22 1 12 9
22 1 12 10
22 1 12 11
22 2 12 12
22 2 12 13
22 2 12 14
22 2 12 15
22 2 12 16
22 2 13 17
;
run;
proc sort data=test out=test2;
by clientid sex educ;
run;
* count dups of clientid sex and educ ;
data test3(keep=clientid sex educ count);
retain count 0;
set test2;
by clientid sex educ; * makes first. & last. flags ;
* initialize counter ;
if first.educ then count = 0;
if first.educ and last.educ then
do;
* i guess we are ignoring ;
* these without duplicates ;
end;
else
do;
* count those having ;
* more than one record ;
count = count + 1;
* write out at end of group ;
if last.educ then output;
end;
run;
Hope this is helpful,
Mark Terjeson
Washington State Department of Social and Health Services
Division of Research and Data Analysis (RDA)
(360) 902-0741
(360) 902-0705 fax
mailto:terjemw@dshs.wa.gov
> -----Original Message-----
> From: Sally Muller [SMTP:sally@EMAIL.UNC.EDU]
> Sent: Thursday, May 18, 2000 8:40 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Duplicate record count...
>
> Dear SAS-Lers,
>
> I just searched the UGA SAS-L archives and came up with a lot of cool
> stuff about finding duplicate records. I especially appreciated two
> messages from Dianne Rhodes and Howard Schreier. I'm following
> Howard's advice in fact and concoting an example:
>
> I have a data set from which I need to create a new data set containing
> for each clientid the COUNT of duplicate records. Please note that a
> clientid may have more than one "set" of duplicate records. So my *final*
> data set needs to look like this:
>
> clientid, sex, educ, count of dups:
>
> 21 1 12 5
> 22 1 12 5
> 22 2 12 5
>
>
> when my data looks like this:
>
> data test;
> input clientid sex educ obsnum;
> datalines;
> 21 1 12 1
> 21 1 12 2
> 21 1 12 3
> 21 1 12 4
> 21 1 12 5
> 21 2 12 6
> 22 1 12 7
> 22 1 12 8
> 22 1 12 9
> 22 1 12 10
> 22 1 12 11
> 22 2 12 12
> 22 2 12 13
> 22 2 12 14
> 22 2 12 15
> 22 2 12 16
> 22 2 13 17
> ;
>
> you'll notice that two observations (obs 6 and obs 17) are ignored since
> they have no duplicates.
>
> As a first step I sorted the data set as follows:
>
> proc sort data=test;
> by clientid sex educ;
>
> but this may not be necessary.
>
> Any suggestions most appreciated!
>
> Best,
> Sally Muller
> P.S. I will summarize all suggestions for the list.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~ Sally Muller Jordan Institute for Families ~
> ~ Voice: 843-7798 School of Social Work - UNC ~
> ~ Email: sally@email.unc.edu Fax: 919.967.7015 ~
> ~ ~
> ~ Visit the Work First web pages: ~
> ~ http://ssw.unc.edu/workfirst ~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~