Date: Sat, 14 Feb 2009 16:29:58 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Is there no other way?
Content-Type: text/plain; charset="us-ascii"
I agree with all that Lou has suggested, with one exception. I have created Data step views that INPUT data from a file only when referenced in a SAS PROC. While SAS may create a file in the background, The SAS compiler has a choice of doing that or streaming data required into the procedure. This method works particularly well when reading data from a compressed dataset via a pipe, but it also works very efficiently where the view or a procedure selects only variables and rows required to execute the procedure.
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Lou
Sent: Friday, February 13, 2009 9:02 PM
Subject: Re: Is there no other way?
Just some nits to pick, interspersed below:
> On Thu, 12 Feb 2009 18:15:57 -0500, Robbie Shan
> > I am a newbie to SAS and though I have had considerable experience
> >in programming with other OO languagues,
SAS isn't an OO language, though some parts do make use of objects.
> >I am having some trouble trying to
> >understand how SAS works fundamentally..
> > I have read quite some literature about it but am still baffled...
> > From what I know, if SAS is to work on data, it needs to pull that
> >data into its native data set else it cant work on it. Is my
> >understanding correct?
No. Or maybe better, partly.
SAS needs to read a file in order to process the data in it. If your application is such that you need to read in a line from an external file (say, a text file), perform some operations on it, and write the result back out to some external (again, say a text file) file, repeating until in the input file is exhausted, no SAS dataset need be created.
If you want to use the built in procedures, called PROCs, your data must be in a SAS dataset. A PROC generally operates on multiple records (called observations in SAS). For instance, PROC SORT will sort a file, but both the input to and the output from that procedure must be in the form of a SAS dataset.
So usually, it's more convenient to convert your data to SAS dataset form, even for straightforward applications that read in, manipulate, and write out. But it's not always necessary.
> > The reason I am asking this is because, if I were to work on millons
> >of records, I cannot think of importing them into my data set while I
> >run some analysis on it. This for some reason seems very inefficient
> >to me.
Indeed you can't - you import them **before** you run your analyses, not while.
> > Ideally, I should be able to run some logic against the data (say a
> >warehouse) and then post it back into the warehouse without having to
> >store it on my computer!
If your data are in a data warehouse, you've apparently imported them to your warehouse without seeing anything inefficient about that. If the warehouse is, say, based on ORACLE, you could do your processing using ORACLE, and bypass the step on converting the data to a form SAS understands. Conversely, if your processing is going to be done in SAS, maybe your warehouse should be in SAS to start with.
From a pure efficiency standpoint, whether you're temporarily storing millions of records on your desktop or not is almost beside the point. If you're processing millions of records that are residing in some DBMS, pulling all that data to your machine over the network and pushing it back out to the DBMS, again over the network, can run into some significant overhead. You'd possibly be better off pushing the code (what, a few dozen or even a few hundred lines) over the network to the DBMS and let the processing take place there. If you want to use SAS to do that, take a look at the documentation for the SQL PASSTHRU facility.