Date: Thu, 4 Nov 2010 15:44:47 -0700
Reply-To: Sterling Paramore <gnilrets@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sterling Paramore <gnilrets@GMAIL.COM>
Subject: Re: Bit type
In-Reply-To: <AANLkTin6c2Dk0eo5c0gOU-LWaPxZgK02j9_=w07QGHhg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
I took a 6 million record sample of my data and converted the Y/N fields
into a compressed value:
data Membs_Enr_Compress (drop = MembsEnr_Eff_Flag_:);
set Membs_Enr;
length compr_value 3;
compr_value = 0;
i = 0;
do iField = 200912,201001 to 201009;
i = i + 1;
compr_value = compr_value +
(vvaluex(cats("MembsEnr_Eff_Flag_",put(iField,6.0)))="Y")*2**(i-1);
end;
run;
I then created a dataset version and a view version of the uncompressed
values:
data Membs_Enr_Uncompress;
set Membs_Enr_Compress;
array YNField_(12) $1;
do i = 1 to 12;
if band(2**(i-1),compr_value) then
YNField_(i) = "Y";
else
YNField_(i) = "N";
end;
run;
data Membs_Enr_Uncompress_V / view = Membs_Enr_Uncompress_V;
set Membs_Enr_Compress;
array YNField_(12) $1;
do i = 1 to 12;
if band(2**(i-1),compr_value) then
YNField_(i) = "Y";
else
YNField_(i) = "N";
end;
run;
Here's a typical use of these tables:
15 proc means missing nway data = Membs_Enr_Uncompress SUM;
16 where YNField_3 = "Y";
17 var Membs_Counter;
18 class Membs_Enroll_yrmo;
19 run;
NOTE: There were 1978518 observations read from the data set
WORK.MEMBS_ENR_UNCOMPRESS.
WHERE YNField_3='Y';
NOTE: PROCEDURE MEANS used (Total process time):
real time 1.93 seconds
cpu time 2.32 seconds
20
21 proc means missing nway data = Membs_Enr_Uncompress_V SUM;
22 where YNField_3 = "Y";
23 var Membs_Counter;
24 class Membs_Enroll_yrmo;
25 run;
NOTE: View WORK.MEMBS_ENR_UNCOMPRESS_V.VIEW used (Total process time):
real time 53.40 seconds
cpu time 48.53 seconds
NOTE: There were 6011208 observations read from the data set
WORK.MEMBS_ENR_COMPRESS.
NOTE: There were 1978518 observations read from the data set
WORK.MEMBS_ENR_UNCOMPRESS_V.
WHERE YNField_3='Y';
NOTE: PROCEDURE MEANS used (Total process time):
real time 53.43 seconds
cpu time 48.54 seconds
As you can see, the view has terrible performance!
-Sterling
On Thu, Nov 4, 2010 at 2:44 PM, Joe Matise <snoopy369@gmail.com> wrote:
> How about making a PROC FCMP function, or even a macro, to fix it?
>
> I mean, the code would be simple:
>
> compr_value=0;
> do __t = 1 to dim(vars);
> compr_value = compr_value + ifn(vars(__t)='Y',1,0)*2**(__t-1);
> end;
>
> to compress, and then for simple uses (ie, filtering by one variable's
> value) you could have a function/macro that took an argument "varnum" and
> returned the value (using BAND). If you are doing frequencies of the
> individual variables, then of course you'd need to use a view, but the view
> wouldn't be any worse than storing the whole uncompressed dataset, and if
> you could WHERE the view based on the value of one of them first, it'd be
> much faster.
>
> -Joe
>
>
> On Thu, Nov 4, 2010 at 3:50 PM, Sterling Paramore <gnilrets@gmail.com>wrote:
>
>> I've tried to build datastep views to hide complexity several times in the
>> past, but they always end up being ridiculously slow.
>>
>> On Thu, Nov 4, 2010 at 1:45 PM, Fehd, Ronald J. (CDC/OCOO/ITSO) <
>> rjf2@cdc.gov> wrote:
>>
>> > > From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-
>> > > l@listserv.uga.edu] On Behalf Of Sterling Paramore
>> > > Sent: Thursday, November 04, 2010 4:29 PM
>> > > To: SAS-L@LISTSERV.UGA.EDU
>> > > Subject: Re: Bit type
>> > >
>> > > It would be worth the disk space savings but it would not be worth the
>> > > overhead of dealing with bitwise functions or explaining binary to my
>> > > end users.
>> >
>> >
>> > [ ] cheap
>> > [ ] easy
>> > [ ] fast
>> >
>> > { } job security <---<<<
>> >
>> > seems to me that hiding the complexity in data step views
>> > both for the compression and decompression
>> > would be the way to go w/this problem
>> >
>> > "I hear that employees who retire
>> > and then come back as consultants make good money!"
>> >
>> > Ron Fehd the module/routine/subroutine maven CDC Atlanta GA USA RJF2
>> at
>> > cdc dot gov
>> >
>>
>
>