LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 1996, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sun, 11 Feb 1996 22:23:28 -0800
Reply-To:   Karsten Self <karsten@NEWAGE1.STANFORD.EDU>
Sender:   "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:   Karsten Self <karsten@NEWAGE1.STANFORD.EDU>
Subject:   Re: Recommend COMPRESS data option?
Comments:   To: "Kirke B. Lawton" <lawton@MONROE.AMCC.ROCHESTER.EDU>
In-Reply-To:   <>

General response:

compressing SAS datasets can result in an output dataset anywhere from 60% to 120% (or more) the size of the uncompressed data (compressing does not always result in smaller datasets).

The algorithm is simple, and not tremendously efficient. Works best on datasets with large character variables with many missing values or much blank space. CPU and i/o can actually be reduced as there is less data to process. Compression is 'on the fly' -- no intermediate files are created, it is simply a different storage structure for a SAS dataset.

Principle downside is that the POINT= processing feature is unavailable. Best test is to try compressing a subset of your data and see what the results are.

On Fri, 9 Feb 1996, Kirke B. Lawton wrote:

> What's the downside of using the COMPRESS=YES data option? We work with > a set of data files that are each approximately 800 meg. A large chunk > of each file is missing values and blanks, so it sounds like the compress > option would offer substantial disk space savings. Presumably processing > some things on the compressed file takes longer than if the file were > uncompressed, but how much longer?

Chief downsides are inconsistent results and loss of 'point=' processing capabilities.

> Here are some specific questions: > 1. When working with a compressed file, does SAS uncompress it to process > it, thereby requiring lots of extra free disk space? That is, are the > temporary files SAS creates when doing its thing smaller when using compressed > files or just the same size as when working with uncompressed file?

No. There are no intermediate files involved. Compression is an alternative storage algorithm applied to an otherwise conventional SAS dataset.

> 2. Are compressed Unix SAS 6.08 files compatible with Windows SAS 6.10?

No. This is a platform issue, not a compression issue. SAS datasets are not directly portable across platforms. The transport dataset is. Read the PROC COPY documentation specifying the XPORT engine.

> 3. What is a rule of thumb for the added processing overhead associated with > using compressed file? 10% more CPU time? 50% more? Even more??

I hate giving un-benchmarked guestimates, but as a SWAG (Scientifik Wild-Ass Guess, I think CPU comes out even or in favor of compression. Reason being less i/o overhead. The difference has never been significant in my book. The earlier 10% is probably ballpark correct, though I'd say it could swing either way.

> 4. Do most informed shops "always" use compress? Or, on the other hand, do > people in the know avoid it like the plague?

No and no. Informed people use compression where it will benefit them. I've had compression specified as the default option, with 'compress=no' specified only where compression's side effects were unsavory, though I'm not working this way currently.

> I don't need a dissertation on the topic, I just want a sense about whether > it is worth starting to incorporate compress into our process.

Yes, where it helps. Best guide is emperical testing.

> Thanks in advance, > Kirke Lawton > >

--------------------------------------------- Karsten M. Self -- Sr. SAS Programmer/Analyst Sierra Information Services, Inc. Contracting for NBER at Stanford University


What part of gestalt don't you understand?

Back to: Top of message | Previous page | Main SAS-L page