|Date: ||Tue, 20 Sep 2011 20:58:13 +0000|
|Reply-To: ||Mike Rhoads <RHOADSM1@WESTAT.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Mike Rhoads <RHOADSM1@WESTAT.COM>|
|Subject: ||Re: Remove duplicate rows|
|Content-Type: ||text/plain; charset="us-ascii"|
I have not played around with this much, and I agree with Toby and Michael that you need to sort by all variables. However, I didn't recall that you had to sort twice.
The 9.2 documentation for NODUPRECS states in part that, when this option is specified, "PROC SORT compares all variable values for each observation to the ones for the previous observation that was written to the output data set." To me, the part about making the comparison as the records are being written to the output data set suggests that the first sort is not necessary. (And if it is, I certainly hope SAS will improve the documentation at some point.)
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Michael Raithel
Sent: Tuesday, September 20, 2011 4:29 PM
Subject: Re: Remove duplicate rows
Toby posted the following to Richard's interesting question:
> I could be mistaken here but somewhere I remembered when you use
> noduprec you have to sort it first by all the variables and then sort
> it again with the noduprec as the duplicate records have to be
> sequential in the data set.
Toby, Bingo; I was thinking the exact same thing! I was going to suggest (using Richard's example):
proc sort data=mess;
proc sort data=mess noduprec dupout=mess_duplicates_removed;
So, now we have a nomination and a second. Perhaps the motion passes. (Man, I've been living in the Washington, DC area for way too long)!
Toby, best of luck in all your SAS endeavors!
(aka Michael A. Raithel)