LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (May 2000, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 18 May 2000 09:15:43 -0700
Reply-To:     "Terjeson, Mark" <TERJEMW@DSHS.WA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Terjeson, Mark" <TERJEMW@DSHS.WA.GOV>
Subject:      Re: Duplicate record count...
Comments: To: Sally Muller <sally@EMAIL.UNC.EDU>
Content-Type: text/plain

Hi Sally, Here is one method of doing so:

data test; input clientid sex educ obsnum; datalines; 21 1 12 1 21 1 12 2 21 1 12 3 21 1 12 4 21 1 12 5 21 2 12 6 22 1 12 7 22 1 12 8 22 1 12 9 22 1 12 10 22 1 12 11 22 2 12 12 22 2 12 13 22 2 12 14 22 2 12 15 22 2 12 16 22 2 13 17 ; run;

proc sort data=test out=test2; by clientid sex educ; run;

* count dups of clientid sex and educ ; data test3(keep=clientid sex educ count); retain count 0;

set test2; by clientid sex educ; * makes first. & last. flags ;

* initialize counter ; if first.educ then count = 0;

if first.educ and last.educ then do; * i guess we are ignoring ; * these without duplicates ; end; else do; * count those having ; * more than one record ; count = count + 1;

* write out at end of group ; if last.educ then output; end;

run;

Hope this is helpful, Mark Terjeson Washington State Department of Social and Health Services Division of Research and Data Analysis (RDA) (360) 902-0741 (360) 902-0705 fax mailto:terjemw@dshs.wa.gov

> -----Original Message----- > From: Sally Muller [SMTP:sally@EMAIL.UNC.EDU] > Sent: Thursday, May 18, 2000 8:40 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Duplicate record count... > > Dear SAS-Lers, > > I just searched the UGA SAS-L archives and came up with a lot of cool > stuff about finding duplicate records. I especially appreciated two > messages from Dianne Rhodes and Howard Schreier. I'm following > Howard's advice in fact and concoting an example: > > I have a data set from which I need to create a new data set containing > for each clientid the COUNT of duplicate records. Please note that a > clientid may have more than one "set" of duplicate records. So my *final* > data set needs to look like this: > > clientid, sex, educ, count of dups: > > 21 1 12 5 > 22 1 12 5 > 22 2 12 5 > > > when my data looks like this: > > data test; > input clientid sex educ obsnum; > datalines; > 21 1 12 1 > 21 1 12 2 > 21 1 12 3 > 21 1 12 4 > 21 1 12 5 > 21 2 12 6 > 22 1 12 7 > 22 1 12 8 > 22 1 12 9 > 22 1 12 10 > 22 1 12 11 > 22 2 12 12 > 22 2 12 13 > 22 2 12 14 > 22 2 12 15 > 22 2 12 16 > 22 2 13 17 > ; > > you'll notice that two observations (obs 6 and obs 17) are ignored since > they have no duplicates. > > As a first step I sorted the data set as follows: > > proc sort data=test; > by clientid sex educ; > > but this may not be necessary. > > Any suggestions most appreciated! > > Best, > Sally Muller > P.S. I will summarize all suggestions for the list. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~ Sally Muller Jordan Institute for Families ~ > ~ Voice: 843-7798 School of Social Work - UNC ~ > ~ Email: sally@email.unc.edu Fax: 919.967.7015 ~ > ~ ~ > ~ Visit the Work First web pages: ~ > ~ http://ssw.unc.edu/workfirst ~ > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Back to: Top of message | Previous page | Main SAS-L page