|
as put by Master-Ian Whitlock
"It is unclear to me whether you are complaining about the design PROC SORT
or how data is managed where you work."
I have nothing to complain about in proc sort.
NOdupKey works exactly as documented
That I consider this to cause uncontrolled loss of
information, reflects the way I see it used, and no
more.
Appreciate Ian's sympathy
Peter
Datum: 30/01/2003 17:33
An: SAS-L@LISTSERV.UGA.EDU
Antwort an: Ian Whitlock <WHITLOI1@WESTAT.COM>
Betreff: Re: Equivalent of NODUPKEY in PROQ SQL
Nachrichtentext:
Peter,
You originally wrote in part:
> The use of "procs sort nodupkey" I consider as an indicator
> of weak analysis and design.
and then from the message below:
> (I think a better data management approach would be to
> lose or drop _all_ those uncontrolled satelite variables).
It is unclear to me whether you are complaining about the design PROC SORT
or how data is managed where you work.
If the former I would translate your parenthetical remark as
The problem with PROC SORT is that it doesn't care about the
variables that you don't care about.
If the latter then I sympathize with you but find nothing wrong with PROC
SORT except for the misleading NODUP option.
IanWhitlock@westat.com
-----Original Message-----
From: Peter Crawford [mailto:peter.crawford@DB.COM]
Sent: Thursday, January 30, 2003 9:43 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Equivalent of NODUPKEY in PROQ SQL
Hi Paul
the "uncontrolled way" to which I refer, occurs (too
often) in the common enough (around here) resolution of
an m:n merge.
The part I consider "uncontrolled" is implied
"If there is no change in the sort keys, the record is not used"
as in all the descriptions I have read about NOdupKey
as in onLineDOc
NODUPKEY
checks for and eliminates observations with duplicate BY values.
If you specify this option, PROC SORT compares all BY values for
each observation to those for the previous observation written to
the output data set. If an exact match is found, the observation
is not written to the output data set.
Therefore, only the first observation of a group having
the same key values is sorted.
Even with the Equals option, we control only that it
would be the first instance in the input data that would
be used, but I consider that a weak level of control.
(I think a better data management approach would be to
lose or drop _all_ those uncontrolled satelite variables).
Without pre-processing, we cannot know how the non-
key variables would compare. Of course, if pre-processing
is available, the NOdupKey option, would probably be
unneccessary
Kind Regards
Peter Crawford
<snip>
--
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
|