Date: Sat, 3 Feb 2007 13:17:11 -0500
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: SAS "Whys??"
I agree.
Art
-------
On Sat, 3 Feb 2007 10:58:39 -0500, Mike Rhoads <RHOADSM1@WESTAT.COM> wrote:
>Art,
>
>I was oversimplifying a bit -- I was thinking of a situation where the
>"validated" flag was set, in addition to the correct sortedby variables
>being indicated. Even given your example (of the programmer setting an
>incorrect sortedby value)< I would still argue that it's
>counterproductive for SAS to automatically do a sort before starting the
>DATA step just on the chance that the sortedby assertion is not correct.
>
>Mike
>
>-----Original Message-----
>From: Arthur Tabachneck [mailto:art297@NETSCAPE.NET]
>Sent: Friday, February 02, 2007 9:24 PM
>To: SAS-L@LISTSERV.UGA.EDU; Mike Rhoads
>Subject: Re: SAS "Whys??"
>
>
>Mike,
>
>re: option A. A problem can exist there too! See: http://xrl.us/ukeu
>
>Art
>---------
>On Fri, 2 Feb 2007 14:59:33 -0500, Mike Rhoads <RHOADSM1@WESTAT.COM>
>wrote:
>
>>1. This is actually an interesting question.
>>
>>For an input data set when a BY statement is in effect, SAS either (A)
>>knows that the DATA set is sorted as desired (by virtue of information
>>stored in the data set header), or (B) doesn't know whether or not it
>is
>>sorted as desired. (Even if the header indicates some other sort
>order,
>>it's possible that the data set is "incidentally" sorted in order by
>the
>>BY-variables as well, possibly based on some data interrelationships
>>that the programmer is aware of.)
>>
>>If (A), no problem.
>>
>>If (B), SAS has two choices. It could either assume the data are
>sorted
>>and start reading the records (as it does now), or it could do a sort
>>just in case they aren't. The latter would obviously be a waste of
>>resources if the data were, in fact, sorted. (I suppose a 3rd possible
>>choice would be to start reading, and then if a break in the sort order
>>is found, to do a sort and then start reading again from the beginning.
>>This strikes me as very complex to implement.)
>>
>>If anything, I would generally prefer in many situations that SAS make
>>me be more specific about some of my assumptions, so I won't complain
>>about the way they have implemented BY-group processing.
>>
>>2. To elaborate a bit on Toby's response:
>>
>>PROC APPEND updates the BASE data set in place by adding records to the
>>end, while DATA ... SET creates a new data set (even though it may have
>>the same name as one of the original input data sets). This explains
>>Toby's observation that APPEND requires reading fewer records and thus
>>is more efficient.
>>
>>However, since PROC APPEND is updating in place, it cannot change the
>>structure of the data set (e.g. by adding new variables). Thus, if
>>there are variables not on the "base" data set that need to be included
>>on the output data set, DATA ... SET is the way you have to go.
>>
>>HTH!
>>
>>Mike Rhoads
>>Westat
>>RhoadsM1@Westat.com
>>
>>-----Original Message-----
>>From: owner-sas-l@listserv.uga.edu
>[mailto:owner-sas-l@listserv.uga.edu]
>>On Behalf Of toby dunn
>>Sent: Friday, February 02, 2007 2:22 PM
>>To: Paul.Leland@GMAIL.COM; SAS-L@LISTSERV.UGA.EDU
>>Subject: RE: SAS "Whys??"
>>
>>
>>>1. Why do you have to sort your datasets before you use them with a
>>>'by' statement (like with a merge for example). How come SAS doesn't
>>>just automatically sort it for you in this instance, or at least have
>>>an option to set to perform this if it is not properly sorted (maybe
>>>there is an option that I am not aware of!! ).
>>
>>Think of what your telling SAS whe your using a by statement, you are
>>explicitly stateing the data is in this order. If it isnt then SAS
>>balks
>>and throws an error. If the data is already grouped and the groups are
>>not
>>in sorted order then just use the NotSorted option.
>>
>>
>>
>>
>>>2. What is the difference between using a 'SET DS1 DS2 DS3' in a data
>>>step vs. Using PROC APPEND' to concatenate data, or I should say under
>>>what circumstances what you want to use one vs. another methed for
>>>concatenating data. And how does SAS handle the properties of
>>>variables if they are different between datasets in a SET statement
>>>(like formats and character lengths).
>>
>>
>>
>>The Set Example requires a full read of all the data while the append
>>only
>>requires reading the data that is to be appended.
>>
>>_________________________________________________________________
>>Valentine's Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
>>http://shopping.msn.com/content/shp/?ctId=8323,ptnrid=37,ptnrdata=24095
>&
>>tcode=wlmtagline
|