Date: Sat, 3 Dec 2005 17:24:34 +0000
Reply-To: iw1junk@COMCAST.NET
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <iw1junk@COMCAST.NET>
Subject: When is the (Data = ) option used for output data set
Hari,
In most languages a word can have more than one meaning. This
is also true in computer languages and in SAS. Consider
1) data CompleteData;
2) data = x ;
3) proc sort data = w out = s ;
In each case the word DATA means something different. In the
first case the word begins a statement, the DATA step statement.
It announces that you want to make one or more data sets with the
names following.
In the second case, it is an assignment statement for the
variable, DATA. The =-sign indicates that this could not be a
DATA statement, hence it must be an assignment.
In the third case we have the PROC statement for the sort
procedure. Most procedures need an input data set and this is
usually indicated by the parameter DATA=. Note that it could not
be an assignment statement because it is inside a PROC statement.
Typically the parameter OUT= is used to indicate an output data
set in a procedure.
You might conclude that DATA= is almost always used in a procedure
statement to specify the input data set. There is no need for consistency
in the DATA step statement since the context is so different. Statement
#2 shows that this inconsistency is necessary if we begin the DATA
statement with the word DATA and do not want a list of key words to
exclude from variable names. Of course one might have
proc makedata out = w ;
/* full data step language */
run ;
Although making a parameter DATA= for this procedure would be far less
flexible that the traditional SET, MERGE, etc.
Note that the DATA step is a completely separate language from
the languages developed for each procedure. The different
procedure languages are often, but not always, consistent with
one another. The more complex procedure languages sometimes have
some statements in common with the DATA step language, but not so
often.
Traditional languages typically depend on functions and
parameters to specify details. SAS is a step oriented language
as opposed to a routine (pre-object oriented) oriented language.
Routines tend to be designed for a very specific purpose, hence
parameters are usually enough to specify the programmers request.
(Object oriented programming can be viewed as an extension of a
routine oriented language where a group of related functions
becomes too complex to handle by mere parameters. Of course,
OOP is more than just this, but it is a valid view sometimes.)
In SAS procedures tend to do a lot more than functions, hence
they need a more flexible means of making specifications than
simple parameters provide. Hence the more complex procedures
tend to develop their own language.
Hope this helps orient you to your new language. Good luck with
the statistics.
Ian Whitlock
===============
Date: Sat, 3 Dec 2005 03:15:03 -0800
Reply-To: Hari <excel_hari@YAHOO.COM>
Sender: "SAS(r) Discussion"
From: Hari <excel_hari@YAHOO.COM>
Organization: http://groups.google.com
Subject: When is the (Data = ) option used for output data
set
Comments: To: sas-l@uga.edu
Content-Type: text/plain; charset="iso-8859-1"
Hi,
If I write a syntax like the following one:-
DATA NormFactor;
set SumData;
Factor = SumE/Sum_A;
run;
I interpret that there is data set called "SumData" and it has
atleast
2 variables called "Sum E" and "Sum_A". I compute a new variable
called
Factor and all the variabled are saved in a NEW data set called
NormFactor.
Smilarly,
data CompleteData;
merge SortedTotal NormFactor;
by Prod;
run;
in the above syntax there are 2 files "SortedTotal" and
"NormFactor"
which has atleast one common variable called Prod and we merge
these 2
by variable Prod and call the merged file as "CompletData".
In both the above options the DATA statement has the name of the
FINAL
filename.
OTOH, in the below syntax
proc sort data = TotaldataUnsorted out = SortedTotal;
by Prod;
run;
the DATA statement specifies the name of the initial data set
while the
OUT specifies the name of the final data set.
I noticed that proc transpose also uses the DATA statement in a
way
similar to proc Sort
Why is it that when we use just "DATA filename" then filename
signifies
name of final file while if I use "Data = filename1 out =
filename2"
then filename1 is the inital file while filename 2 is the final
file?
Shouldnt the meaning of Data be consistent across different
proc's?
Regards,
Hari
India