LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 Nov 2010 15:44:47 -0700
Reply-To:     Sterling Paramore <gnilrets@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sterling Paramore <gnilrets@GMAIL.COM>
Subject:      Re: Bit type
In-Reply-To:  <AANLkTin6c2Dk0eo5c0gOU-LWaPxZgK02j9_=w07QGHhg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

I took a 6 million record sample of my data and converted the Y/N fields into a compressed value:

data Membs_Enr_Compress (drop = MembsEnr_Eff_Flag_:); set Membs_Enr;

length compr_value 3; compr_value = 0; i = 0; do iField = 200912,201001 to 201009; i = i + 1; compr_value = compr_value + (vvaluex(cats("MembsEnr_Eff_Flag_",put(iField,6.0)))="Y")*2**(i-1); end; run;

I then created a dataset version and a view version of the uncompressed values:

data Membs_Enr_Uncompress; set Membs_Enr_Compress;

array YNField_(12) $1;

do i = 1 to 12; if band(2**(i-1),compr_value) then YNField_(i) = "Y"; else YNField_(i) = "N"; end;

run;

data Membs_Enr_Uncompress_V / view = Membs_Enr_Uncompress_V; set Membs_Enr_Compress;

array YNField_(12) $1;

do i = 1 to 12; if band(2**(i-1),compr_value) then YNField_(i) = "Y"; else YNField_(i) = "N"; end;

run;

Here's a typical use of these tables:

15 proc means missing nway data = Membs_Enr_Uncompress SUM; 16 where YNField_3 = "Y"; 17 var Membs_Counter; 18 class Membs_Enroll_yrmo; 19 run;

NOTE: There were 1978518 observations read from the data set WORK.MEMBS_ENR_UNCOMPRESS. WHERE YNField_3='Y'; NOTE: PROCEDURE MEANS used (Total process time): real time 1.93 seconds cpu time 2.32 seconds

20 21 proc means missing nway data = Membs_Enr_Uncompress_V SUM; 22 where YNField_3 = "Y"; 23 var Membs_Counter; 24 class Membs_Enroll_yrmo; 25 run;

NOTE: View WORK.MEMBS_ENR_UNCOMPRESS_V.VIEW used (Total process time): real time 53.40 seconds cpu time 48.53 seconds

NOTE: There were 6011208 observations read from the data set WORK.MEMBS_ENR_COMPRESS. NOTE: There were 1978518 observations read from the data set WORK.MEMBS_ENR_UNCOMPRESS_V. WHERE YNField_3='Y'; NOTE: PROCEDURE MEANS used (Total process time): real time 53.43 seconds cpu time 48.54 seconds

As you can see, the view has terrible performance!

-Sterling

On Thu, Nov 4, 2010 at 2:44 PM, Joe Matise <snoopy369@gmail.com> wrote:

> How about making a PROC FCMP function, or even a macro, to fix it? > > I mean, the code would be simple: > > compr_value=0; > do __t = 1 to dim(vars); > compr_value = compr_value + ifn(vars(__t)='Y',1,0)*2**(__t-1); > end; > > to compress, and then for simple uses (ie, filtering by one variable's > value) you could have a function/macro that took an argument "varnum" and > returned the value (using BAND). If you are doing frequencies of the > individual variables, then of course you'd need to use a view, but the view > wouldn't be any worse than storing the whole uncompressed dataset, and if > you could WHERE the view based on the value of one of them first, it'd be > much faster. > > -Joe > > > On Thu, Nov 4, 2010 at 3:50 PM, Sterling Paramore <gnilrets@gmail.com>wrote: > >> I've tried to build datastep views to hide complexity several times in the >> past, but they always end up being ridiculously slow. >> >> On Thu, Nov 4, 2010 at 1:45 PM, Fehd, Ronald J. (CDC/OCOO/ITSO) < >> rjf2@cdc.gov> wrote: >> >> > > From: owner-sas-l@listserv.uga.edu [mailto:owner-sas- >> > > l@listserv.uga.edu] On Behalf Of Sterling Paramore >> > > Sent: Thursday, November 04, 2010 4:29 PM >> > > To: SAS-L@LISTSERV.UGA.EDU >> > > Subject: Re: Bit type >> > > >> > > It would be worth the disk space savings but it would not be worth the >> > > overhead of dealing with bitwise functions or explaining binary to my >> > > end users. >> > >> > >> > [ ] cheap >> > [ ] easy >> > [ ] fast >> > >> > { } job security <---<<< >> > >> > seems to me that hiding the complexity in data step views >> > both for the compression and decompression >> > would be the way to go w/this problem >> > >> > "I hear that employees who retire >> > and then come back as consultants make good money!" >> > >> > Ron Fehd the module/routine/subroutine maven CDC Atlanta GA USA RJF2 >> at >> > cdc dot gov >> > >> > >


Back to: Top of message | Previous page | Main SAS-L page