Date: Wed, 12 Jun 2002 13:04:02 -0700
Reply-To: Cassell.David@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject: Re: Creating Subsets of Data Based on Variable Values
Content-type: text/plain; charset=us-ascii
Thomas.Hauge@EDDIEBAUER.COM replied [in part]:
> The user currently logons on remotely to UNIX, and some very quick SQL
queries
> are performed on an ORACLE database, returning values that populate
some
> list boxes in AF to use as selectable items. The addition I want to
make would
> allow the user to view currently available data in a UNIX directory.
My thought was
> that if I could break up an 11Gigabyte file into multiple, smaller
files, with some kind
> of "intelligence" in the filename, I could then fairly quickly
populate the AF list box
> with what data available for the user to select.
> Since a new, large file would be available every few weeks, I would
like to automate
> this process as much as possible. I suspect that I would have no more
than perhaps
> 20 files in the directory at a time.
Well, you have code now to do this. But you may also want to test
whether it is just as fast to simply: (1) index on your variable,
and then (2) use a WHERE clause on the index to select out the relevant
records. If the 11G file is already a SAS data set, this should be
quite fast, as SAS will know just where to jump to, in order to begin
reading.. and when to stop, as well. The indexing is very quick, even
on massive files. Just use PROC DATASETS each time you get a new ver-
sion of the file, and you could be good to go.
Give it a try both ways, and see what the wall-clock times are.
If the index approach is about as good, then use it. It will save
plenty of programmer-time at the other end. :-)
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|