LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2011, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 20 Sep 2011 21:52:46 +0000
Reply-To:   toby dunn <tobydunn@HOTMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   toby dunn <tobydunn@HOTMAIL.COM>
Subject:   Re: Remove duplicate rows
Comments:   To: rhoadsm1@westat.com
In-Reply-To:   <D47ACC3DC1D6CA4D85DE61DC39DA319F0630AE6B@EX10MAIL1.westat.com>
Content-Type:   text/plain; charset="Windows-1252"

Mike, I wasnt dreamin I foiund this in the version 8 Docs: "Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations may remain in the output data set. You can remove all duplicates with this option by sorting on all variables." The Doc you reference Mike also says this: "Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations might remain in the output data set. You can remove all duplicates with this option by sorting on all variables." What I am not understanding is the last sentence, is it a sort before the sort with noduprecs option, or does it mean that if you have to sort on all variables for the noduprec to work like we think it would, and if you only sort with less than all the variables its not guaranteed to sork like we think it should? Toby Dunn

If you get thrown from a horse, you have to get up and get back on, unless you landed on a cactus; then you have to roll around and scream in pain. “Any idiot can face a crisis—it’s day to day living that wears you out” ~ Anton Chekhov

> Date: Tue, 20 Sep 2011 20:58:13 +0000 > From: RHOADSM1@WESTAT.COM > Subject: Re: Remove duplicate rows > To: SAS-L@LISTSERV.UGA.EDU > > Hmmmm. > > I have not played around with this much, and I agree with Toby and Michael that you need to sort by all variables. However, I didn't recall that you had to sort twice. > > The 9.2 documentation for NODUPRECS states in part that, when this option is specified, "PROC SORT compares all variable values for each observation to the ones for the previous observation that was written to the output data set." To me, the part about making the comparison as the records are being written to the output data set suggests that the first sort is not necessary. (And if it is, I certainly hope SAS will improve the documentation at some point.) > > Ref: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146878.htm#a003070995 > > > Mike Rhoads > RhoadsM1@Westat.com > > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Michael Raithel > Sent: Tuesday, September 20, 2011 4:29 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Remove duplicate rows > > Dear SAS-L-ers, > > Toby posted the following to Richard's interesting question: > > > Richard.... > > > > I could be mistaken here but somewhere I remembered when you use > > noduprec you have to sort it first by all the variables and then sort > > it again with the noduprec as the duplicate records have to be > > sequential in the data set. > > > Toby, Bingo; I was thinking the exact same thing! I was going to suggest (using Richard's example): > > proc sort data=mess; > by _all_; > run; > > proc sort data=mess noduprec dupout=mess_duplicates_removed; > by _all_; > run; > > So, now we have a nomination and a second. Perhaps the motion passes. (Man, I've been living in the Washington, DC area for way too long)! > > Toby, best of luck in all your SAS endeavors! > > Take Care! > > ----MMMMIIIIKKKKEEEE > (aka Michael A. Raithel)


Back to: Top of message | Previous page | Main SAS-L page