LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (February 2007, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 3 Feb 2007 21:28:49 +0000
Reply-To:   iw1junk@COMCAST.NET
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ian Whitlock <iw1junk@COMCAST.NET>
Subject:   Re: SAS "Whys??"
Comments:   cc: plw213 <Paul.Leland@GMAIL.COM>

Summary: Philosophical and practical reasons. #iw-value=1

Paul,

There is always a question of just how much hand-holding the programmer should be given in a language. SAS chooses to make a lot of things easier on the programmer, but generally shys away from trying to cover programmer errors. Consider:

data ; set w ; by id ; ... run ;

It is a programmer error when the data is not sorted. Consequently SAS is not philosophically disposed to automatically correct this mistake. To see why, consider a practical application.

You have a monster input file and have written

data v / view = v ; infile ... ; input ... ; run ;

data w ; set v ; by id ; .... run ;

Now you know that sorting this monster will take a couple of hours for the sort itself and 1) calls to the system people arranging for the extra disc packs needed to sort this file, and 2) time with the manuals to get exactly the right options to handle this monster. But it is not necessary because you know the file is in ID order. Do you want SAS second guessing you and doing a safety sort so that you do not have to think about the code that you write?

Instead of a step creating a view, V could have been coming from a large Oracle data base through the use of the Oracle engine, or would it matter if V already existed as a plain SAS data set?

How much extra time would you think it reasonable for SAS to spend checking on whether the data was sorted or not? Funny, but the more important it is that the data is in order the less willing I would be to pass off to SAS the task of knowing the order.

Of course one might add a system option to sort data automatically or not sort it automatically. Which would you want for the default? Would you remember to change it as needed? Do the answers depend on the type of programming that you do? Would SI receive more screams for adding this option or for not adding this option?

Now suppose the problem is somewhat smaller, but you have many DATA steps working on the data all taking advantage of the fact that you know the order of the data doesn't change from one step to the next. Do you want an extra 10 minutes added to each step to do an sort that isn't necessary, but might be if the programmer didn't know what he was doing?

What do you think the boss should do when he hears his programmer say, "I don't like to have to think about the order of my files when programming"?

Now to consider your second question of PROC APPEND versus SET DS1 DS2 DS3 ... Which is better really depends on the situation, so here are some situations to consider.

1) Each day you get a dataset LIB.TEMP and your job is to add this data to LIB.MASTER. Would you choose to write the code with PROC APPEND or a DATA step?

2) Would your answer change if both files are in ID order and the resulting file must be in ID order each day? What if the files were indexed by ID, but the order didn't matter?

3) At the end of each month you are to create LIB.MASTER from LIB.DS1,.. to LIB.DS31 (or DS30, or DS29, or DS28). Now which technique would you use?

4) You have just DS1 and DS2 that must be concatenated. They have the same structure and DS2 is much smaller than DS1, but the next step required is a DATA step passing the concatenated file.

5) Would your answer change if there is no next step?

6) Would you answer change if the length of a character variable needed to be made larger? Would it make any difference if the variable were numeric? What it the name of one of the variables had to change? What if one of the variables had to be dropped?

I am not sure I know how to answer all of these questions, but I am sure that I would find out when faced with the situation that made it necessary.

Perhaps the rules of what SAS programming is about are not so easy that they will all be spelled out in the documentation. At least you are ahead of many in asking questions or searching for answers.

Ian Whitlock ================ Date: Fri, 2 Feb 2007 11:10:49 -0800 Reply-To: plw213 <Paul.Leland@GMAIL.COM> Sender: "SAS(r) Discussion" From: plw213 <Paul.Leland@GMAIL.COM> Organization: http://groups.google.com Subject: SAS "Whys??" Comments: To: sas-l Content-Type: text/plain; charset="iso-8859-1"

Hi all -

I have a couple of curiousity questions about the use of SAS that I am hoping someone can shed some light on:

1. Why do you have to sort your datasets before you use them with a 'by' statement (like with a merge for example). How come SAS doesn't just automatically sort it for you in this instance, or at least have an option to set to perform this if it is not properly sorted (maybe there is an option that I am not aware of!! ).

2. What is the difference between using a 'SET DS1 DS2 DS3' in a data step vs. Using PROC APPEND' to concatenate data, or I should say under what circumstances what you want to use one vs. another methed for concatenating data. And how does SAS handle the properties of variables if they are different between datasets in a SET statement (like formats and character lengths).

I am sure I could find out all this information by doing a little reasearch on my own, but I figured someone could give me some insight in a more timely manner ; )

Thanks for your time and advice in advance!!


Back to: Top of message | Previous page | Main SAS-L page