Date: Wed, 21 Jul 2010 09:40:27 -0400
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "NOMAIL Roger S. Clark" <roger.s.clark@CENSUS.GOV>
Subject: Re: In search of a more efficient program
Content-type: text/plain; charset=US-ASCII
According to the SAS online documentation for SAS 9.1, you can set the
length of a numeric variable to as low as 3. A length of 3 will allow a
number to be as large as 8,192 and retain 3 significant digits.
I don't know if that would increase the efficiency of your program.
I have also used indexes in the past when I had to sort a dataset many
times on different variables during processing. One can improve the
overall running time using several indexes (that are built in one
statement) over using multiple sorts. If you are sorting your program only
once, then I think that may run faster using the sort instead of an index.
Especially if you only have 10 distinct values on the key variable.
Indexes are more efficient if there are a large number of distinct values.
I consulted the SAS institute regarding a similar question, and was told
that if I used an index, then I should expect longer processing times...
> I've inherited a large, complex SAS program. Most files are quite small;
> however, some are extremely large and have become a problem.
> The files in question have 12-24 fields that are mostly numeric with a
> short (1-4character) fields. The files that are killing me have 20M
> and one has 160M records.
> The files don't use SAS compression because the records are too short to
> make compression cost & time effective.
> Problem 1:
> What are the trade-offs if I force numeric length to 4 or 2? Does SAS
> always use a NUM 8 format internally? If I force short numeric field
> lengths, will SAS have to convert them up to length=8 and back down again
> order to use the data?
> Problem 2:
> What are the trade-offs if I use an index instead of a sort? The problem
> sort takes an hour, uses a single numeric key with about 10 distinct
> sorts 20M records that are about 100 bytes long. I've been successful
> indexing before; in that situation, I needed to sort the file in 15
> different sequences.
> Thanks for your input and advice.
Roger S. Clark
Address Products Management Branch