|Date: ||Sun, 11 Feb 1996 22:23:28 -0800|
|Reply-To: ||Karsten Self <karsten@NEWAGE1.STANFORD.EDU>|
|Sender: ||"SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>|
|From: ||Karsten Self <karsten@NEWAGE1.STANFORD.EDU>|
|Subject: ||Re: Recommend COMPRESS data option?|
compressing SAS datasets can result in an output dataset anywhere from 60%
to 120% (or more) the size of the uncompressed data (compressing does not
always result in smaller datasets).
The algorithm is simple, and not tremendously efficient. Works best on
datasets with large character variables with many missing values or much
blank space. CPU and i/o can actually be reduced as there is less data to
process. Compression is 'on the fly' -- no intermediate files are
created, it is simply a different storage structure for a SAS dataset.
Principle downside is that the POINT= processing feature is unavailable.
Best test is to try compressing a subset of your data and see what the
On Fri, 9 Feb 1996, Kirke B. Lawton wrote:
> What's the downside of using the COMPRESS=YES data option? We work with
> a set of data files that are each approximately 800 meg. A large chunk
> of each file is missing values and blanks, so it sounds like the compress
> option would offer substantial disk space savings. Presumably processing
> some things on the compressed file takes longer than if the file were
> uncompressed, but how much longer?
Chief downsides are inconsistent results and loss of 'point=' processing
> Here are some specific questions:
> 1. When working with a compressed file, does SAS uncompress it to process
> it, thereby requiring lots of extra free disk space? That is, are the
> temporary files SAS creates when doing its thing smaller when using compressed
> files or just the same size as when working with uncompressed file?
No. There are no intermediate files involved. Compression is an
alternative storage algorithm applied to an otherwise conventional SAS
> 2. Are compressed Unix SAS 6.08 files compatible with Windows SAS 6.10?
No. This is a platform issue, not a compression issue. SAS datasets are
not directly portable across platforms. The transport dataset is. Read
the PROC COPY documentation specifying the XPORT engine.
> 3. What is a rule of thumb for the added processing overhead associated with
> using compressed file? 10% more CPU time? 50% more? Even more??
I hate giving un-benchmarked guestimates, but as a SWAG (Scientifik
Wild-Ass Guess, I think CPU comes out even or in favor of compression.
Reason being less i/o overhead. The difference has never been
significant in my book. The earlier 10% is probably ballpark correct,
though I'd say it could swing either way.
> 4. Do most informed shops "always" use compress? Or, on the other hand, do
> people in the know avoid it like the plague?
No and no. Informed people use compression where it will benefit them.
I've had compression specified as the default option, with 'compress=no'
specified only where compression's side effects were unsavory, though I'm
not working this way currently.
> I don't need a dissertation on the topic, I just want a sense about whether
> it is worth starting to incorporate compress into our process.
Yes, where it helps. Best guide is emperical testing.
> Thanks in advance,
> Kirke Lawton
Karsten M. Self -- Sr. SAS Programmer/Analyst
Sierra Information Services, Inc.
Contracting for NBER at Stanford University
What part of gestalt don't you understand?