LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2011, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 3 Mar 2011 23:14:10 -0500
Reply-To:     Arthur Tabachneck <art297@ROGERS.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Arthur Tabachneck <art297@ROGERS.COM>
Subject:      Re: Proc Sort nodupkey for all columns
Comments: To: Mark Keintz <mkeintz@WHARTON.UPENN.EDU>
Content-Type: text/plain; charset=ISO-8859-1

Mark,

I can't agree with you more that one MUST understand exactly what is going on with all of the possible options and their interactions.

Like you, I personally almost always choose nodupkey. But, if nodup will do what one needs, and can save a couple of hours of processing time, AND ONE UNDERSTANDS THE EFFECT OF ALL OF THE OPTIONS THEY DID AND DIDN'T SELECT, then I would agree to that choice.

Art ------- On Fri, 4 Mar 2011 01:33:45 +0000, Keintz, H. Mark <mkeintz@WHARTON.UPENN.EDU> wrote:

>Art: > >I can think of no instance in which I would use NODUPREC instead of NODUPKEY. NODUPREC does not adhere to what most people would think it means. I'm sure you know this, but I'm not sure everyone does. > >Consider this data set: > >data have (drop=i); > do key=3 to 2 by -1; > do other=1,2,3,2,1; > do i=1 to other; > output; > end; > end; > end; >run; > >It has 6 unique records, some repeated, not always consecutively. (Let OTHER represent some 300 variables). > >Here's what I think some people might think you are suggesting: > >proc sort data=have out=want noduprec; > by key; >run; > > >WANT has 10 observations, not 6. > >Why? Because PROC SORT defaults to EQUALS, it preserves original order of all records with the same value of KEY. But since some identical records are not consecutive within a key, they are not detected by NODUPREC. > > >This is why I think data _null_ suggested _ALL_ but not noduprec: > >proc sort data=have out=want2 nodupkey; > by key _all_; >run; > >.. which yields 6 records. > >Yes, when every variable is a sort var (because of _ALL_ in the BY statement), either noduprec or nodupkey will work, but IMO nodupkey is truer to its na�ve interpretation than NODUPREC. And unless you know all your identical records within a key will end up consecutive after sort, I can think of no situation in which someone would benefit from using NODUPREC instead of NODUPKEY/_ALL_. > > >Regards, >Mark > > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Arthur Tabachneck >Sent: Thursday, March 03, 2011 6:14 PM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Re: Proc Sort nodupkey for all columns > >Mark, > >I agree and it should take less time to sort. > >Art >------- >On Thu, 3 Mar 2011 14:36:27 -0800, Mark Miller <mdhmiller@GMAIL.COM> wrote: > >>What about using NODUPREC ? >> >>... mark miller >> >>On 3/3/2011 1:31 PM, Irin later wrote: >>> Thanks a lot! >>> >>> --- On Thu, 3/3/11, Data _null_;<iebupdte@gmail.com> wrote: >>> >>> >>> From: Data _null_;<iebupdte@gmail.com> >>> Subject: Re: Proc Sort nodupkey for all columns >>> To: "Irin later"<irinfigvam@yahoo.com> >>> Cc: SAS-L@listserv.uga.edu >>> Date: Thursday, March 3, 2011, 4:24 PM >>> >>> >>> _ALL_ >>> >>> On Thu, Mar 3, 2011 at 3:18 PM, Irin later<irinfigvam@yahoo.com> wrote: >>>> I do a proc sort nodupkey like below to get rid of exact dups: >>>> >>>> Proc sort nodupkey data= CombinedFlags out=ppro.CombinedFlagsnodup; >>>> by field1 field 2 ....... field300; run; >>>> >>>> Since I need to get rid of exact duplicate records,I have to specify >each field in the record as that is what I need : to delete dups records with equal values in the same columns.... >>>> >>>> Is there any way in syntax in order I could do it without specifying >hundreds of fields? >>>> >>>> Thank you in advance! >>>> >>>> >>>> Irin >>>> >>>> >>>> >>>> >>> >>> >>>


Back to: Top of message | Previous page | Main SAS-L page