Date: Wed, 12 Jun 2002 13:04:02 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject: Re: Creating Subsets of Data Based on Variable Values
Content-type: text/plain; charset=us-ascii
Thomas.Hauge@EDDIEBAUER.COM replied [in part]:
> The user currently logons on remotely to UNIX, and some very quick SQL
> are performed on an ORACLE database, returning values that populate
> list boxes in AF to use as selectable items. The addition I want to
> allow the user to view currently available data in a UNIX directory.
My thought was
> that if I could break up an 11Gigabyte file into multiple, smaller
files, with some kind
> of "intelligence" in the filename, I could then fairly quickly
populate the AF list box
> with what data available for the user to select.
> Since a new, large file would be available every few weeks, I would
like to automate
> this process as much as possible. I suspect that I would have no more
> 20 files in the directory at a time.
Well, you have code now to do this. But you may also want to test
whether it is just as fast to simply: (1) index on your variable,
and then (2) use a WHERE clause on the index to select out the relevant
records. If the 11G file is already a SAS data set, this should be
quite fast, as SAS will know just where to jump to, in order to begin
reading.. and when to stop, as well. The indexing is very quick, even
on massive files. Just use PROC DATASETS each time you get a new ver-
sion of the file, and you could be good to go.
Give it a try both ways, and see what the wall-clock times are.
If the index approach is about as good, then use it. It will save
plenty of programmer-time at the other end. :-)
David Cassell, CSC
Senior computing specialist