LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 5 Aug 2004 10:47:00 -0700
Reply-To:     "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:      Re: Size of ICK file when sorting
Comments: To: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>

Yes indeed:

104 data char (compress=char) 105 bin (compress=binary); 106 array bigchar bigchar1-bigchar50; 107 do i = 1 to 1000; 108 do over bigchar; 109 bigchar = ranuni(1); 110 end; 111 output; 112 end; 113 114 run;

NOTE: The data set WORK.CHAR has 1000 observations and 51 variables. NOTE: Compressing data set WORK.CHAR increased size by 7.69 percent. Compressed is 28 pages; un-compressed would require 26 pages. NOTE: The data set WORK.BIN has 1000 observations and 51 variables. NOTE: Compressing data set WORK.BIN increased size by 7.69 percent. Compressed is 28 pages; un-compressed would require 26 pages. NOTE: DATA statement used (Total process time): real time 0.09 seconds cpu time 0.04 seconds

Where does compress=binary create a benefit?

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Jack Hamilton Sent: Thursday, August 05, 2004 10:30 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Size of ICK file when sorting

"Choate, Paul@DDS" <pchoate@DDS.CA.GOV> wrote: >The documentation only recommends binary >compression on long records with lots of binary data:

Personally, I would not claim that SAS documentation is always clear and complete and unambiguous and never needs empirical verification.

===== 1 data char (compress=char) 2 bin (compress=binary); 3 4 length bigchar $50.; 5 6 bigchar = repeat('00'x, 49); 7 8 do i = 1 to 1000; 9 output; 10 end; 11 12 run;

NOTE: The data set WORK.CHAR has 1000 observations and 2 variables. NOTE: Compressing data set WORK.CHAR decreased size by 60.00 percent. Compressed is 6 pages; un-compressed would require 15 pages. NOTE: The data set WORK.BIN has 1000 observations and 2 variables. NOTE: Compressing data set WORK.BIN decreased size by 53.33 percent. Compressed is 7 pages; un-compressed would require 15 pages. =====

Binary compression in this case is fairly effective, even though the data set doesn't meet the requirements in the documentation. "Effective" is, of course, a subjective term.

===== 72 data char (compress=char) 73 bin (compress=binary); 74 75 znum = 0; 76 77 array z z1-z100; 78 do over z; 79 z = znum; 80 end; 81 82 do i = 1 to 1000; 83 output; 84 end; 85 86 drop i; 87 88 run;

NOTE: The data set WORK.CHAR has 1000 observations and 101 variables. NOTE: Compressing data set WORK.CHAR decreased size by 94.12 percent. Compressed is 3 pages; un-compressed would require 51 pages. NOTE: The data set WORK.BIN has 1000 observations and 101 variables. NOTE: Compressing data set WORK.BIN decreased size by 94.12 percent. Compressed is 3 pages; un-compressed would require 51 pages. =====

Here, the criteria are clearly met, yet binary compress performs no better than character compression.

-- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA

>>> "Choate, Paul@DDS" <pchoate@DDS.CA.GOV> 08/05/2004 9:56 AM >>> The documentation only recommends binary compression on long records with lots of binary data:

<sasdoc9> This method is highly effective for compressing medium to large (several hundred bytes or larger) blocks of binary data (numeric variables). Because the compression function operates on a single record at a time, the record length needs to be several hundred bytes or larger for effective compression. </sasdoc9>

I don't know how well it works if character data is interspersed in the binary data.

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Jack Hamilton Sent: Thursday, August 05, 2004 9:51 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Size of ICK file when sorting

There's no fixed rule for when data sets should be compressed. Some data sets compress well, and others actually get larger when compressed. You just have to try it. The closest you can come to a rule is "If you have character variables with many repeating characters (including blanks), then use compression".

I haven't used COMPRESS=BINARY enough to come up with a rule of thumb for its use.

-- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA

>>> "Chuck Enright" <chuck_sas@cfedata.com> 08/04/2004 7:04 PM >>> Jack,

If my primary goal is to minimize the disk space used, with processing time a secondary goal, should I avoid using the system option and use the dataset option only for permanent datasets?

Quoting Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>:

> What are your compression options? Is it possible that the input data > set is compressed and the output data set is not? > > > -- > JackHamilton@FirstHealth.com > Manager, Technical Development > Metrics Department, First Health > West Sacramento, California USA > > >>> <sophe88@YAHOO.COM> 08/03/2004 11:08 AM >>> > I try to sort a 3 GB file under SAS 9.1 for Windows. > > proc sort data=mylib.abc(drop=var1 var2) nodupkeys out=mylib.out1; by > id; run; > > When the sorting kicks off, I notice in the library location there is > a file with extension .Lck ticking up in size as the sorting goes on. > > But I saw the .LCK file (which is supposed to replace mylib.out1) > actually was exceeding mylib.abc in size and the sorting showed no > sign to stop. > > I know var1 and var2 all are character var with length=100 and I have > about 20% dup records by ID. What is wrong here? > > PD >


Back to: Top of message | Previous page | Main SAS-L page