LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2007, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 31 Dec 2007 21:34:57 +0000
Reply-To:     iw1junk@COMCAST.NET
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Whitlock <iw1junk@COMCAST.NET>
Subject:      Re: Automatic variables, First dot & Last dot
Comments: cc: "Zimmerman, Bernie" <Bernie_Zimmerman@MEDCO.COM>

Summary: Locating groups of records with duplicate keys. #iw-value=1

I generated data with

data wxyz_nroll ; input Surname $ First $ Birth_Date :date9. SSN x ; cards ; a b 1jan2006 1 1 a b 1jan2006 1 2 a b 1jan2006 1 1 c d 2jan2007 3 1 ;

Your code did what I expected it to - count duplicate key values and write one summary record of the number of records in each group with duplicate keys. Since it is a summary, each record just gives the last value in the group of each non-key variable.

Your code can be simplified to

data work.sweep1 ; set work.Wxyz_sweep1(in= _sweep1) ; by Surname First Birth_Date SSN ;

if FIRST.SSN then dupes = 0 ;

dupes+ 1 ;

if LAST.SSN and dupes GT 1 then output work.sweep1 ;

run ;

since the remaining code is extraneous, but does no harm.

If I wanted to see the groups I would use

data work.sweep2 ; set work.Wxyz_sweep1(in= _sweep1) ; by Surname First Birth_Date SSN ; if first.ssn then dupenum = 0 ; if not(first.ssn and last.ssn) ; dupenum + 1 ; run ;

Probably you expected the result of the SQL code

proc sql ; select * , count(*) as dupes from work.Wxyz_nroll group by Surname, First, Birth_Date, SSN having dupes > 1 ; quit ;

Then I tried

proc sort data= work.Wxyz_nroll out= work.Wxyz_sweep1 dupout=sweep3 nodupkey ; by Surname First Birth_Date SSN ; run ;

This last one gives the distinct observations in a group. It is not quite the same as the other programs, but might be good enough, since you haven't distinguished duplicate records from duplicate keys in your question.

I was somewhat surprised in the behavior of DUPOUT with the NODUPKEY option considering that the NODUP option gave 0 observations for the DUPOUT file.

Ian Whitlock ================

Date: Mon, 31 Dec 2007 14:48:39 -0500 Reply-To: "Zimmerman, Bernie" <Bernie_Zimmerman@MEDCO.COM> Sender: "SAS(r) Discussion" From: "Zimmerman, Bernie" <Bernie_Zimmerman@MEDCO.COM> Subject: Automatic variables, First dot & Last dot Content-Type: text/plain; charset="us-ascii"

Listers:

You always come through for me, so if anyone is still out there reading, here goes ..

I have a file of approx. 50 K records with approx 40 variables, e.g. First, Surname, SSN, Birth_Date, etc. Furthermore, I believe the file contains duplicate records. If I sort the file on: Surname, First, Birth_Date and SSN as follows:

proc sort data= work.Wxyz_nroll out= work.Wxyz_sweep1 /* nodupkey */ ; by Surname First Birth_Date SSN ; run ;

I would like to send all duplicate combinations to a file for printing and review. Every time I try using the First. And Last. Automatic variables I seem to get things wrong. The data step below does not give me the results I expected. Does anyone see what I need to change to get things right?

data work.sweep1 ; set work.Wxyz_sweep1(in= _sweep1) ; by Surname First Birth_Date SSN ;

if FIRST.Surname AND FIRST.First AND FIRST.Birth_Date AND FIRST.SSN then do ; dupes = 00 ;

end ; dupes+ 1 ; if LAST.Surname AND LAST.First AND LAST.Birth_Date AND LAST.SSN then do ; if dupes GT 1 then do ; output work.sweep1 ; end ; end ; run ;

title3 "First sweep -- by variables are .. " ; title4 " Surname First Birth_Date & SSN " ; proc print data= work.sweep1 ; var dupes Eff_Date DIS_Dte Surname First MI Birth_Date SSN Carrier Group Member Person HICn TRC TRtext ; run ;

TIA and Happy New Year !

Bernie Zimmerman Medco Eligibility Operations, Triage (201) 269- 6201


Back to: Top of message | Previous page | Main SAS-L page