LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 3 Dec 2005 17:24:34 +0000
Reply-To:     iw1junk@COMCAST.NET
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Whitlock <iw1junk@COMCAST.NET>
Subject:      When is the (Data = ) option used for output data set
Comments: cc: Hari <excel_hari@YAHOO.COM>

Hari,

In most languages a word can have more than one meaning. This is also true in computer languages and in SAS. Consider

1) data CompleteData; 2) data = x ; 3) proc sort data = w out = s ;

In each case the word DATA means something different. In the first case the word begins a statement, the DATA step statement. It announces that you want to make one or more data sets with the names following.

In the second case, it is an assignment statement for the variable, DATA. The =-sign indicates that this could not be a DATA statement, hence it must be an assignment.

In the third case we have the PROC statement for the sort procedure. Most procedures need an input data set and this is usually indicated by the parameter DATA=. Note that it could not be an assignment statement because it is inside a PROC statement. Typically the parameter OUT= is used to indicate an output data set in a procedure.

You might conclude that DATA= is almost always used in a procedure statement to specify the input data set. There is no need for consistency in the DATA step statement since the context is so different. Statement #2 shows that this inconsistency is necessary if we begin the DATA statement with the word DATA and do not want a list of key words to exclude from variable names. Of course one might have

proc makedata out = w ; /* full data step language */ run ;

Although making a parameter DATA= for this procedure would be far less flexible that the traditional SET, MERGE, etc.

Note that the DATA step is a completely separate language from the languages developed for each procedure. The different procedure languages are often, but not always, consistent with one another. The more complex procedure languages sometimes have some statements in common with the DATA step language, but not so often.

Traditional languages typically depend on functions and parameters to specify details. SAS is a step oriented language as opposed to a routine (pre-object oriented) oriented language. Routines tend to be designed for a very specific purpose, hence parameters are usually enough to specify the programmers request. (Object oriented programming can be viewed as an extension of a routine oriented language where a group of related functions becomes too complex to handle by mere parameters. Of course, OOP is more than just this, but it is a valid view sometimes.)

In SAS procedures tend to do a lot more than functions, hence they need a more flexible means of making specifications than simple parameters provide. Hence the more complex procedures tend to develop their own language.

Hope this helps orient you to your new language. Good luck with the statistics.

Ian Whitlock =============== Date: Sat, 3 Dec 2005 03:15:03 -0800 Reply-To: Hari <excel_hari@YAHOO.COM> Sender: "SAS(r) Discussion" From: Hari <excel_hari@YAHOO.COM> Organization: http://groups.google.com Subject: When is the (Data = ) option used for output data set Comments: To: sas-l@uga.edu Content-Type: text/plain; charset="iso-8859-1" Hi,

If I write a syntax like the following one:-

DATA NormFactor; set SumData; Factor = SumE/Sum_A; run;

I interpret that there is data set called "SumData" and it has atleast 2 variables called "Sum E" and "Sum_A". I compute a new variable called Factor and all the variabled are saved in a NEW data set called NormFactor.

Smilarly,

data CompleteData; merge SortedTotal NormFactor; by Prod; run;

in the above syntax there are 2 files "SortedTotal" and "NormFactor" which has atleast one common variable called Prod and we merge these 2 by variable Prod and call the merged file as "CompletData".

In both the above options the DATA statement has the name of the FINAL filename.

OTOH, in the below syntax

proc sort data = TotaldataUnsorted out = SortedTotal; by Prod; run;

the DATA statement specifies the name of the initial data set while the OUT specifies the name of the final data set.

I noticed that proc transpose also uses the DATA statement in a way similar to proc Sort

Why is it that when we use just "DATA filename" then filename signifies name of final file while if I use "Data = filename1 out = filename2" then filename1 is the inital file while filename 2 is the final file?

Shouldnt the meaning of Data be consistent across different proc's?

Regards, Hari India


Back to: Top of message | Previous page | Main SAS-L page