|
Summary: Philosophical and practical reasons.
#iw-value=1
Paul,
There is always a question of just how much hand-holding the programmer
should be given in a language. SAS chooses to make a lot of things easier
on the programmer, but generally shys away from trying to cover programmer
errors. Consider:
data ;
set w ;
by id ;
...
run ;
It is a programmer error when the data is not sorted. Consequently SAS is
not philosophically disposed to automatically correct this mistake. To see
why, consider a practical application.
You have a monster input file and have written
data v / view = v ;
infile ... ;
input ... ;
run ;
data w ;
set v ;
by id ;
....
run ;
Now you know that sorting this monster will take a couple of hours for the
sort itself and 1) calls to the system people arranging for the extra disc
packs needed to sort this file, and 2) time with the manuals to get exactly
the right options to handle this monster. But it is not necessary because
you know the file is in ID order. Do you want SAS second guessing you and
doing a safety sort so that you do not have to think about the code that
you write?
Instead of a step creating a view, V could have been coming from a large
Oracle data base through the use of the Oracle engine, or would it matter
if V already existed as a plain SAS data set?
How much extra time would you think it reasonable for SAS to spend checking
on whether the data was sorted or not? Funny, but the more important it is
that the data is in order the less willing I would be to pass off to SAS
the task of knowing the order.
Of course one might add a system option to sort data automatically or not
sort it automatically. Which would you want for the default? Would you
remember to change it as needed? Do the answers depend on the type of
programming that you do? Would SI receive more screams for adding this
option or for not adding this option?
Now suppose the problem is somewhat smaller, but you have many DATA steps
working on the data all taking advantage of the fact that you know the
order of the data doesn't change from one step to the next. Do you want an
extra 10 minutes added to each step to do an sort that isn't necessary, but
might be if the programmer didn't know what he was doing?
What do you think the boss should do when he hears his programmer say, "I
don't like to have to think about the order of my files when programming"?
Now to consider your second question of PROC APPEND versus SET DS1 DS2 DS3
... Which is better really depends on the situation, so here are some
situations to consider.
1) Each day you get a dataset LIB.TEMP and your job is to add this data
to LIB.MASTER. Would you choose to write the code with PROC APPEND
or a DATA step?
2) Would your answer change if both files are in ID order and the
resulting file must be in ID order each day? What if the files were
indexed by ID, but the order didn't matter?
3) At the end of each month you are to create LIB.MASTER from LIB.DS1,..
to LIB.DS31 (or DS30, or DS29, or DS28). Now which technique would
you use?
4) You have just DS1 and DS2 that must be concatenated. They have the
same structure and DS2 is much smaller than DS1, but the next step
required is a DATA step passing the concatenated file.
5) Would your answer change if there is no next step?
6) Would you answer change if the length of a character variable needed
to be made larger? Would it make any difference if the variable were
numeric? What it the name of one of the variables had to change?
What if one of the variables had to be dropped?
I am not sure I know how to answer all of these questions, but I am sure
that I would find out when faced with the situation that made it necessary.
Perhaps the rules of what SAS programming is about are not so easy that
they will all be spelled out in the documentation. At least you are ahead
of many in asking questions or searching for answers.
Ian Whitlock
================
Date: Fri, 2 Feb 2007 11:10:49 -0800
Reply-To: plw213 <Paul.Leland@GMAIL.COM>
Sender: "SAS(r) Discussion"
From: plw213 <Paul.Leland@GMAIL.COM>
Organization: http://groups.google.com
Subject: SAS "Whys??"
Comments: To: sas-l
Content-Type: text/plain; charset="iso-8859-1"
Hi all -
I have a couple of curiousity questions about the use of SAS that I am
hoping someone can shed some light on:
1. Why do you have to sort your datasets before you use them with a 'by'
statement (like with a merge for example). How come SAS doesn't just
automatically sort it for you in this instance, or at least have an option
to set to perform this if it is not properly sorted (maybe there is an
option that I am not aware of!! ).
2. What is the difference between using a 'SET DS1 DS2 DS3' in a data step
vs. Using PROC APPEND' to concatenate data, or I should say under what
circumstances what you want to use one vs. another methed for concatenating
data. And how does SAS handle the properties of variables if they are
different between datasets in a SET statement (like formats and character
lengths).
I am sure I could find out all this information by doing a little reasearch
on my own, but I figured someone could give me some insight in a more
timely manner ; )
Thanks for your time and advice in advance!!
|