LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 1999, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 23 Oct 1999 10:51:43 -0400
Reply-To:   RAITHEM <RAITHEM@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   RAITHEM <RAITHEM@WESTAT.COM>
Subject:   (MVS) Re[2]: (MVS) Re[2]: SAS arrays - again
Comments:   To: Seth Grimes <grimes@CHELE.CAIS.NET>
Content-Type:   text/plain; charset=US-ASCII

In an old thread about SAS data set compression, Seth Grimes posted the following reply to my posting:

<<My original posting can be found beneath the Sig line, below>>

>In dealing with datasets with very wide but sparse records -- that is, >about 2950 variables in each observation with 60-70% zero -- if I don't >compress the SAS dataset is about 6 times larger than a flat file that uses >variable-length, delimited fields. Compressing the SAS dataset results in >a file that's a small percentage larger than the flat file. I figure that >using variable-length fields in the SAS program would carry too much >overhead to be worthwhile. >

Seth, you make a good point about the benefits of compressing SAS data sets! Peter Crawford made a point along the same lines when he suggested that SAS data set compressing will become more important and come in handy for the longer text variables in Versions 7 and 8 of the SAS System. There is no doubt that SAS data set compression is a good tool in reducing the size of the footprint of SAS data sets.

My only gripe is that currently, SAS Version 6.09E, the CPU time overhead of compressing/de-compressing SAS data sets during processing is heavy. If the trade-off of DASD space vs. CPU time is acceptable in your organization for the huge SAS data set, then compression is good for you. If not; then you have a lot of 'splaining to do to your Computer Performance staff. Either way; as long as programmers know the Yin and Yang of the choices--Bigger SAS data sets, less CPU time during processing; Smaller SAS data sets, more CPU time during processing--they will make the choice that is right for their applications and their organizations!

Seth, best of luck as you give your SAS observations the Sardine treatment and squash them into compressed SAS data sets!

I hope that this answer proves helpful now, and in the future!

Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Michael A. Raithel "The man who wrote the book on performance." E-mail: raithem@westat.com Author: Tuning SAS Applications in the MVS Environment ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ When you cease to make a contribution you begin to die. -- Eleanor Roosevelt ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ <<My original posting as presented by Seth in his posting>>

> > Tim Berryhill posted the following comment to Matt Santoni's recent thread: > > >Interesting compression statistics. > > >> ---------- > >> From: mvs1000[SMTP:mvs1000@YAHOO.COM] > >> Reply To: mvs1000 > >> Sent: Thursday, October 14, 1999 8:39 AM > >> To: SAS-L@LISTSERV.UGA.EDU > >> Subject: SAS arrays - again > >> > ><SNIP> > >> NOTE: The data set WORK.TEMP1 has 6386 observations and 5 variables. > >> NOTE: Compressing data set WORK.TEMP1 increased size by 13.79 percent. > >> Compressed is 33 pages; un-compressed would require 29 pages. > >> NOTE: The DATA statement used 27.26 seconds. > ><SNIP> > >> NOTE: The data set WORK.TEMP2 has 6386 observations and 4 variables. > >> NOTE: Compressing data set WORK.TEMP2 increased size by 30.43 percent. > >> Compressed is 30 pages; un-compressed would require 23 pages. > >> NOTE: The DATA statement used 57.01 seconds. > >> > > Tim, your wry but poignant comment underscores one of the pitfalls of SAS data > set compression that not all SAS programmers may be aware of. Namely, if you > apply SAS compression to a SAS data set and the data is not ideally suited to > compressing, you can actually end up with a data set that is larger than the > original. When you compound this particular occurrence with the increase in CPU > time expended to access the observations in the compressed SAS data set, you > have a real, bone-fida, big-time LOSE/LOSE situation. > > So, how can you end up with a "compressed" SAS data set that is larger than the > original. Well, it is quite easy, really. On the all-powerful operating system > known as OS/390, or as MVS, the SAS System puts a 12-byte header, containing > compression control information, on each observation in the compressed data set. > The SAS System compresses data within the observations according to this chart: > > Type of Character Length of Original Redundant Character String Compressed > Length > -------------------------- > -------------------------------------------------------------------- > ------------------------------- > Binary Zeros 3 to 66 2 > Blanks 3 to 129 2 > Missing Values N/A Not Compressed > All Others 3 to 63 3 > > If you have observations where none of the data compresses out, you have an > increase in size of 12 bytes per observation, so your overall SAS data set size > increases. Not good; not good at all! For compression to do more than break > even, you need to compress out at least 13 bytes per observation; just to be one > byte ahead of the 12-byte overhead compression imposes. > > The four inter-related elements that I look at in deciding upon likely SAS data > set compression candidates are: > > 1. A large percentage of the observations in a SAS data set must compress. > 2. A large portion of each individual observation must compress. > 3. Observations must contain a significant amount of adjacent redundancy. > 4. Observations must be reduced in size by more than the Compression Control > header (12 bytes). > > Beyond the elements, above, a general rule of thumb that I use is that SAS data > sets with short, or very short, observations are usually poor candidates for > compression. > > Overall, I have never been a big fan of SAS data set compression on the big > iron. True, it can reduce the overall size of a SAS data set and thus reduce my > DASD storage charges. True, it can reduce the EXCP count (I/O's) of all > programs that access the compressed SAS data set and thus reduce my EXCP > charges. But, even _MORE_TRUE_ it greatly increases the CPU time of all > programs that access the compressed SAS data set, greatly increasing my CPU time > charges. Since most organizations that I have worked with that have IS Charge > Back software favor charging more for CPU time, the two YINs (reduced data set > size and reduced EXCP count) are outweighed by the big YANG (greatly increased > CPU time). Of course, off of the big iron, this may be a non-issue. > > Best of luck to those of you who are trying to put their SAS data sets on a > storage diet via SAS data set compression. I hope that it doesn't come back to > byte you! > > I hope that this answer proves helpful now, and in the future! > > Of course, all of these opinions and insights are my own, and do not reflect > those of my organization or my associates. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Michael A. Raithel > "The man who wrote the book on performance." > E-mail: raithem@westat.com > Author: Tuning SAS Applications in the MVS Environment > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ..you can't start a fire; you can't start a fire without a spark... -- Bruce > Springsteen > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Syst

-- Seth Grimes Alta Plana database & Web / design & development grimes@altaplana.com http://altaplana.com 301-873-8225


Back to: Top of message | Previous page | Main SAS-L page