LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 Nov 2010 17:53:04 -0500
Reply-To:     Joe Matise <snoopy369@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Joe Matise <snoopy369@GMAIL.COM>
Subject:      Re: Bit type
Comments: To: Sterling Paramore <gnilrets@gmail.com>
In-Reply-To:  <AANLkTikykyimU4XQX_JTzkT0SC=Hb9WErwtSi5rw5gHV@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

If you just make a function that does the extracting for you, you don't have to use a view - you can just WHERE on that function's results (using BAND, or some math) and never uncompress, if you don't plan on asking for a frequency of the YN variables. (You don't have to make an actual function if you don't want to, though it might be easier for your users - a macro, or even just plain math/BAND usage in the PROC, should do).

-Joe

On Thu, Nov 4, 2010 at 5:44 PM, Sterling Paramore <gnilrets@gmail.com>wrote:

> I took a 6 million record sample of my data and converted the Y/N fields > into a compressed value: > > > data Membs_Enr_Compress (drop = MembsEnr_Eff_Flag_:); > set Membs_Enr; > > length compr_value 3; > compr_value = 0; > i = 0; > do iField = 200912,201001 to 201009; > i = i + 1; > compr_value = compr_value + > (vvaluex(cats("MembsEnr_Eff_Flag_",put(iField,6.0)))="Y")*2**(i-1); > end; > run; > > > > > I then created a dataset version and a view version of the uncompressed > values: > > > data Membs_Enr_Uncompress; > set Membs_Enr_Compress; > > array YNField_(12) $1; > > do i = 1 to 12; > if band(2**(i-1),compr_value) then > YNField_(i) = "Y"; > else > YNField_(i) = "N"; > end; > > run; > > data Membs_Enr_Uncompress_V / view = Membs_Enr_Uncompress_V; > set Membs_Enr_Compress; > > array YNField_(12) $1; > > do i = 1 to 12; > if band(2**(i-1),compr_value) then > YNField_(i) = "Y"; > else > YNField_(i) = "N"; > end; > > run; > > > Here's a typical use of these tables: > > 15 proc means missing nway data = Membs_Enr_Uncompress SUM; > 16 where YNField_3 = "Y"; > 17 var Membs_Counter; > 18 class Membs_Enroll_yrmo; > 19 run; > > NOTE: There were 1978518 observations read from the data set > WORK.MEMBS_ENR_UNCOMPRESS. > WHERE YNField_3='Y'; > NOTE: PROCEDURE MEANS used (Total process time): > real time 1.93 seconds > cpu time 2.32 seconds > > > 20 > 21 proc means missing nway data = Membs_Enr_Uncompress_V SUM; > 22 where YNField_3 = "Y"; > 23 var Membs_Counter; > 24 class Membs_Enroll_yrmo; > 25 run; > > NOTE: View WORK.MEMBS_ENR_UNCOMPRESS_V.VIEW used (Total process time): > real time 53.40 seconds > cpu time 48.53 seconds > > NOTE: There were 6011208 observations read from the data set > WORK.MEMBS_ENR_COMPRESS. > NOTE: There were 1978518 observations read from the data set > WORK.MEMBS_ENR_UNCOMPRESS_V. > WHERE YNField_3='Y'; > NOTE: PROCEDURE MEANS used (Total process time): > real time 53.43 seconds > cpu time 48.54 seconds > > > As you can see, the view has terrible performance! > > -Sterling > > > > > On Thu, Nov 4, 2010 at 2:44 PM, Joe Matise <snoopy369@gmail.com> wrote: > > > How about making a PROC FCMP function, or even a macro, to fix it? > > > > I mean, the code would be simple: > > > > compr_value=0; > > do __t = 1 to dim(vars); > > compr_value = compr_value + ifn(vars(__t)='Y',1,0)*2**(__t-1); > > end; > > > > to compress, and then for simple uses (ie, filtering by one variable's > > value) you could have a function/macro that took an argument "varnum" and > > returned the value (using BAND). If you are doing frequencies of the > > individual variables, then of course you'd need to use a view, but the > view > > wouldn't be any worse than storing the whole uncompressed dataset, and if > > you could WHERE the view based on the value of one of them first, it'd be > > much faster. > > > > -Joe > > > > > > On Thu, Nov 4, 2010 at 3:50 PM, Sterling Paramore <gnilrets@gmail.com > >wrote: > > > >> I've tried to build datastep views to hide complexity several times in > the > >> past, but they always end up being ridiculously slow. > >> > >> On Thu, Nov 4, 2010 at 1:45 PM, Fehd, Ronald J. (CDC/OCOO/ITSO) < > >> rjf2@cdc.gov> wrote: > >> > >> > > From: owner-sas-l@listserv.uga.edu [mailto:owner-sas- > >> > > l@listserv.uga.edu] On Behalf Of Sterling Paramore > >> > > Sent: Thursday, November 04, 2010 4:29 PM > >> > > To: SAS-L@LISTSERV.UGA.EDU > >> > > Subject: Re: Bit type > >> > > > >> > > It would be worth the disk space savings but it would not be worth > the > >> > > overhead of dealing with bitwise functions or explaining binary to > my > >> > > end users. > >> > > >> > > >> > [ ] cheap > >> > [ ] easy > >> > [ ] fast > >> > > >> > { } job security <---<<< > >> > > >> > seems to me that hiding the complexity in data step views > >> > both for the compression and decompression > >> > would be the way to go w/this problem > >> > > >> > "I hear that employees who retire > >> > and then come back as consultants make good money!" > >> > > >> > Ron Fehd the module/routine/subroutine maven CDC Atlanta GA USA RJF2 > >> at > >> > cdc dot gov > >> > > >> > > > > >


Back to: Top of message | Previous page | Main SAS-L page