Date: Mon, 31 Dec 2007 21:34:57 +0000
Reply-To: iw1junk@COMCAST.NET
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <iw1junk@COMCAST.NET>
Subject: Re: Automatic variables, First dot & Last dot
Summary: Locating groups of records with duplicate keys.
#iw-value=1
I generated data with
data wxyz_nroll ;
input Surname $ First $ Birth_Date :date9. SSN x ;
cards ;
a b 1jan2006 1 1
a b 1jan2006 1 2
a b 1jan2006 1 1
c d 2jan2007 3 1
;
Your code did what I expected it to - count duplicate key values
and write one summary record of the number of records in each group
with duplicate keys. Since it is a summary, each record just gives
the last value in the group of each non-key variable.
Your code can be simplified to
data work.sweep1 ;
set work.Wxyz_sweep1(in= _sweep1) ;
by Surname First Birth_Date SSN ;
if FIRST.SSN then dupes = 0 ;
dupes+ 1 ;
if LAST.SSN and dupes GT 1 then output work.sweep1 ;
run ;
since the remaining code is extraneous, but does no harm.
If I wanted to see the groups I would use
data work.sweep2 ;
set work.Wxyz_sweep1(in= _sweep1) ;
by Surname First Birth_Date SSN ;
if first.ssn then dupenum = 0 ;
if not(first.ssn and last.ssn) ;
dupenum + 1 ;
run ;
Probably you expected the result of the SQL code
proc sql ;
select * , count(*) as dupes
from work.Wxyz_nroll
group by Surname, First, Birth_Date, SSN
having dupes > 1
;
quit ;
Then I tried
proc sort data= work.Wxyz_nroll
out= work.Wxyz_sweep1 dupout=sweep3 nodupkey ;
by Surname First Birth_Date SSN ;
run ;
This last one gives the distinct observations in a group. It is not
quite the same as the other programs, but might be good enough, since
you haven't distinguished duplicate records from duplicate keys in
your question.
I was somewhat surprised in the behavior of DUPOUT with the NODUPKEY
option considering that the NODUP option gave 0 observations for the
DUPOUT file.
Ian Whitlock
================
Date: Mon, 31 Dec 2007 14:48:39 -0500
Reply-To: "Zimmerman, Bernie" <Bernie_Zimmerman@MEDCO.COM>
Sender: "SAS(r) Discussion"
From: "Zimmerman, Bernie" <Bernie_Zimmerman@MEDCO.COM>
Subject: Automatic variables, First dot & Last dot
Content-Type: text/plain; charset="us-ascii"
Listers:
You always come through for me, so if anyone is still out there
reading,
here goes ..
I have a file of approx. 50 K records with approx 40 variables, e.g.
First, Surname, SSN, Birth_Date, etc. Furthermore, I believe the file
contains duplicate records. If I sort the file on: Surname, First,
Birth_Date and SSN as follows:
proc sort data= work.Wxyz_nroll
out= work.Wxyz_sweep1 /* nodupkey */ ;
by Surname First Birth_Date SSN ; run ;
I would like to send all duplicate combinations to a file for printing
and review. Every time I try using the First. And Last. Automatic
variables I seem to get things wrong. The data step below does not
give
me the results I expected. Does anyone see what I need to change to
get
things right?
data work.sweep1 ;
set work.Wxyz_sweep1(in= _sweep1) ;
by Surname First Birth_Date SSN ;
if FIRST.Surname AND FIRST.First
AND FIRST.Birth_Date AND FIRST.SSN then do ;
dupes = 00 ;
end ;
dupes+ 1 ;
if LAST.Surname AND LAST.First
AND LAST.Birth_Date AND LAST.SSN then do ;
if dupes GT 1 then do ;
output work.sweep1 ;
end ;
end ;
run ;
title3 "First sweep -- by variables are .. " ;
title4 " Surname First Birth_Date & SSN " ;
proc print data= work.sweep1 ;
var dupes
Eff_Date DIS_Dte Surname First MI Birth_Date SSN
Carrier Group Member Person HICn TRC TRtext ;
run ;
TIA and Happy New Year !
Bernie Zimmerman
Medco
Eligibility Operations, Triage
(201) 269- 6201