LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2002, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 12 Jun 2002 13:04:02 -0700
Reply-To:     Cassell.David@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject:      Re: Creating Subsets of Data Based on Variable Values
Content-type: text/plain; charset=us-ascii

Thomas.Hauge@EDDIEBAUER.COM replied [in part]: > The user currently logons on remotely to UNIX, and some very quick SQL queries > are performed on an ORACLE database, returning values that populate some > list boxes in AF to use as selectable items. The addition I want to make would > allow the user to view currently available data in a UNIX directory. My thought was > that if I could break up an 11Gigabyte file into multiple, smaller files, with some kind > of "intelligence" in the filename, I could then fairly quickly populate the AF list box > with what data available for the user to select. > Since a new, large file would be available every few weeks, I would like to automate > this process as much as possible. I suspect that I would have no more than perhaps > 20 files in the directory at a time.

Well, you have code now to do this. But you may also want to test whether it is just as fast to simply: (1) index on your variable, and then (2) use a WHERE clause on the index to select out the relevant records. If the 11G file is already a SAS data set, this should be quite fast, as SAS will know just where to jump to, in order to begin reading.. and when to stop, as well. The indexing is very quick, even on massive files. Just use PROC DATASETS each time you get a new ver- sion of the file, and you could be good to go.

Give it a try both ways, and see what the wall-clock times are. If the index approach is about as good, then use it. It will save plenty of programmer-time at the other end. :-)

David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page