|
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Paul Dorfman
> Sent: Wednesday, July 16, 2008 11:24 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Informat, Format, & Length Statements
>
> Dan,
>
> Whether 3 numeric bytes under W/U can store an integer
> exactly depends not
> only on its absolute value but also on its other properties.
> For example,
> all *even* integers can be stored in 3 bytes exactly up to
> n=16,384; after
> that, some even integers can and some cannot. As to the odd
> integers, the
> greatest one stored in 3 bytes precisely is 8191.
>
> The smallest integer not stored correctly in 3 bytes is thus 8192, so
> generally speaking, in this respect SAS9.2 documentation does
> hold water.
>
> On a different note, I find it utterly useless to use numeric
> length less
> than full 8 bytes. Since its only purpose can be saving disk
> space (and
> wasting CPU time), other methods exist that allow for much greater
> savings. To wit, just 2 character bytes can store an ~8 times greater
> integer than 3 numeric bytes, namely, up to 256**3-1=65,535 via simple
>
> put (n, pib2.) ;
>
> while 3 bytes can store the whopping 256**3-1=16,777,215 via
>
> put (n, pib3.) ;
>
> The added value of this method is the firm knowledge and
> predictability fo
> the exact integer precision being thusly rendered, without any need to
> consult with the manual (or SAS-L, for that matter). To the
> objection that
> the rb-pib-rb conversion takes time I'd retort that the unavoidable
> implicit conversion from 8 to fewer numeric bytes and vice versa takes
> time, too.
>
> Kind regards
> ------------
> Paul Dorfman
> Jax, FL
> ------------
>
>
Paul,
I am in agreement with most of what you write above (as usual), especially that it is not particularly useful to define numerics as less than 8 bytes. I did a couple quick tests which suggest to me that you are correct that rb-pib-rb is not much different in terms of total processing time, with the added benefit as you point out of correctly storing results. (I am going to have to think about whether I might be able to use this profitably in my work).
264 run;
NOTE: The data set WORK.TEST2 has 100000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 31.71 seconds
cpu time 24.82 seconds
265 data test3;
266 do i=1 to 1e8;
267 j=put (i, pib3.) ;
268 output;
269 end;
270 run;
NOTE: The data set WORK.TEST3 has 100000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 44.31 seconds
cpu time 36.28 seconds
271 data test4;
272 set test2;
273 k=j;
274 run;
NOTE: There were 100000000 observations read from the data set WORK.TEST2.
NOTE: The data set WORK.TEST4 has 100000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 2:33.06
cpu time 58.62 seconds
275 data test5;
276 set test3;
277 k=input(j,pib3.);
278 run;
NOTE: There were 100000000 observations read from the data set WORK.TEST3.
NOTE: The data set WORK.TEST5 has 100000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 2:18.45
cpu time 1:00.43
The one small quibble I have is with your statement that "all *even* integers can be stored in 3 bytes exactly up to n=16,384". It is true that if you store the value 8194 in a length 3 numeric, when you read it back you will get 8194. However, when you read a numeric variable defined as length 3 from a file and the value you get is 8194, you don't know if the original value was 8194 or 8195. So I would argue that the value is not stored "exactly". You have lost 1-bit of precision.
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
|