Date: Thu, 3 Mar 2011 23:14:10 -0500
Reply-To: Arthur Tabachneck <art297@ROGERS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@ROGERS.COM>
Subject: Re: Proc Sort nodupkey for all columns
Content-Type: text/plain; charset=ISO-8859-1
Mark,
I can't agree with you more that one MUST understand exactly what is going
on with all of the possible options and their interactions.
Like you, I personally almost always choose nodupkey. But, if nodup will do
what one needs, and can save a couple of hours of processing time, AND ONE
UNDERSTANDS THE EFFECT OF ALL OF THE OPTIONS THEY DID AND DIDN'T SELECT,
then I would agree to that choice.
Art
-------
On Fri, 4 Mar 2011 01:33:45 +0000, Keintz, H. Mark
<mkeintz@WHARTON.UPENN.EDU> wrote:
>Art:
>
>I can think of no instance in which I would use NODUPREC instead of
NODUPKEY. NODUPREC does not adhere to what most people would think it
means. I'm sure you know this, but I'm not sure everyone does.
>
>Consider this data set:
>
>data have (drop=i);
> do key=3 to 2 by -1;
> do other=1,2,3,2,1;
> do i=1 to other;
> output;
> end;
> end;
> end;
>run;
>
>It has 6 unique records, some repeated, not always consecutively. (Let
OTHER represent some 300 variables).
>
>Here's what I think some people might think you are suggesting:
>
>proc sort data=have out=want noduprec;
> by key;
>run;
>
>
>WANT has 10 observations, not 6.
>
>Why? Because PROC SORT defaults to EQUALS, it preserves original order of
all records with the same value of KEY. But since some identical records
are not consecutive within a key, they are not detected by NODUPREC.
>
>
>This is why I think data _null_ suggested _ALL_ but not noduprec:
>
>proc sort data=have out=want2 nodupkey;
> by key _all_;
>run;
>
>.. which yields 6 records.
>
>Yes, when every variable is a sort var (because of _ALL_ in the BY
statement), either noduprec or nodupkey will work, but IMO nodupkey is truer
to its na�ve interpretation than NODUPREC. And unless you know all your
identical records within a key will end up consecutive after sort, I can
think of no situation in which someone would benefit from using NODUPREC
instead of NODUPKEY/_ALL_.
>
>
>Regards,
>Mark
>
>
>-----Original Message-----
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Arthur
Tabachneck
>Sent: Thursday, March 03, 2011 6:14 PM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: Proc Sort nodupkey for all columns
>
>Mark,
>
>I agree and it should take less time to sort.
>
>Art
>-------
>On Thu, 3 Mar 2011 14:36:27 -0800, Mark Miller <mdhmiller@GMAIL.COM> wrote:
>
>>What about using NODUPREC ?
>>
>>... mark miller
>>
>>On 3/3/2011 1:31 PM, Irin later wrote:
>>> Thanks a lot!
>>>
>>> --- On Thu, 3/3/11, Data _null_;<iebupdte@gmail.com> wrote:
>>>
>>>
>>> From: Data _null_;<iebupdte@gmail.com>
>>> Subject: Re: Proc Sort nodupkey for all columns
>>> To: "Irin later"<irinfigvam@yahoo.com>
>>> Cc: SAS-L@listserv.uga.edu
>>> Date: Thursday, March 3, 2011, 4:24 PM
>>>
>>>
>>> _ALL_
>>>
>>> On Thu, Mar 3, 2011 at 3:18 PM, Irin later<irinfigvam@yahoo.com> wrote:
>>>> I do a proc sort nodupkey like below to get rid of exact dups:
>>>>
>>>> Proc sort nodupkey data= CombinedFlags out=ppro.CombinedFlagsnodup;
>>>> by field1 field 2 ....... field300; run;
>>>>
>>>> Since I need to get rid of exact duplicate records,I have to specify
>each field in the record as that is what I need : to delete dups records
with equal values in the same columns....
>>>>
>>>> Is there any way in syntax in order I could do it without specifying
>hundreds of fields?
>>>>
>>>> Thank you in advance!
>>>>
>>>>
>>>> Irin
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
|