LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (October 2002, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 28 Oct 2002 13:42:55 -0500
Reply-To:     Francis Harvey <HARVEYF1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Francis Harvey <HARVEYF1@WESTAT.COM>
Subject:      Re: Bit arrays
Content-Type: text/plain; charset="iso-8859-1"

Greetings Paul,

Taking advantage of the shadow bytes in 8.2, I then got the code below. Aside from the first 104 byte price tag for all arrays, I seem to lose at most 7 bytes with any particular array. For a really large bit array, this would appear to have an advantage over losing 8 to 11 bits for every member of a numeric array. I will also be examining using a large character variable as an array as well as recoding this example using your numeric arrays to perform speed comparisons, but I wondered if you had any initial impressions on this method.

%let arrBits = 833; %let arrBytes = %eval((&arrBits - 1) / 8 + 1); %let mult = %sysfunc(ceil(%sysfunc(sign(&arrBytes - 104)) / 2)); %let arrElements = %eval(((((&arrBytes - 105) / 8) + 1) * 8 * &mult) + 1); data _null_; array bitArr{&arrElements} $ 1 _temporary_;

/* Create pointer to array */ arrAddr = addr(bitArr{1});

/* Initialize all bits to 0 */ call poke(repeat("00"x,&actArrBytes - 1),arrAddr,&actArrBytes);

/* Set bit 1 */ bitSet = 1; /* Find out what byte this is in */ byteOffset = int((bitSet - 1) / 8); bitPattern = peek(arrAddr + byteOffset,1); bitPattern = bor(bitPattern,2 ** mod(bitSet - 1,8)); call poke(put(bitPattern,ib8.),arrAddr + byteOffset,1);

/* Set bit 833 */ bitSet = 833; /* Find out what byte this is in */ byteOffset = int((bitSet - 1) / 8); bitPattern = peek(arrAddr + byteOffset,1); bitPattern = bor(bitPattern,2 ** mod(bitSet - 1,8)); call poke(put(bitPattern,ib8.),arrAddr + byteOffset,1);

/* Find bits that have been set */ do i = 1 to &arrBits; bitGet = i; /* Find out what byte this is in */ byteOffset = int((bitGet - 1) / 8); bitPattern = peek(arrAddr + byteOffset,1); test = 1 and band(bitPattern ,2 ** mod(bitGet - 1,8)); if test = 1 then do; put bitGet; end; end; run;

Francis R. Harvey III WB303, x3952 harveyf1@westat.com

VB programmers know the wisdom of Nothing

> -----Original Message----- > From: Paul Dorfman [mailto:paul_dorfman@hotmail.com] > Sent: Monday, June 03, 2002 10:57 PM > To: Francis Harvey; SAS-L@LISTSERV.UGA.EDU > Subject: Re: Bit arrays > > > Francis, > > Unfortunately, in 8.2 the situation got fixed only partially. > As you know, > before 8.2, if a temporary character array were declared as > > array a (1000000) $1. _temporary_ ; > > if would allocate 8 bytes per item, leaving the "undeclared" bytes as > "shadow bytes". That meant that the array elements would have > the expression > length of 1, yet the memory length would by 8, proven by the > fact that the > addresses of the array items would be spaced 8 bytes apart. > > You of course understand my enthusiasm when I learned that > the Institute had > addressed my concern and made the items of the array A > adjacent in memory > spaced just 1 byte apart, as it should be. However, the joy > was short-lived. > Yes, the Data step now spaces the addresses of array elements > exactly as far > apart in memory as indicated by the declared length. And yes, > just as it was > before, the expression length corresponds to the declared > one. However, in a > really bizarre twist, it still allocated 8 byte of real > memory per item. If > you find it hard to believe, here is a proof (SAS V9.0, Windows XP, > irrelevant notes killed): > > 13 data _null_; > 14 run ; > > NOTE: DATA statement used (Total process time): > Memory 89k > 15 > 16 data _null_; > 17 array a (0 : 999999) $1. _temporary_ ; > 18 addr0 = addr(a(0)) ; > 19 addr1 = addr(a(1)) ; > 20 addr2 = addr(a(2)) ; > 21 put addr0--addr2 ; > 22 run ; > > 74978304 74978305 74978306 > NOTE: DATA statement used (Total process time): > Memory 8868k > > I am sorry, but if you are planning on bitmapping a > reasonable range without > running out of memory, you may want to "adopt the paper's > approach", no > matter how hideous it may appear. > > Kind regrets, > ================== > Paul M. Dorfman > Jacksonville, Fl > ================== > > > > > > ----Original Message Follows---- > From: Francis Harvey <HARVEYF1@WESTAT.COM> > > Greetings Ken; > > Unfortunately, this still leaves me with the same quandary as the > paper mentioned, just because 8.2 allows me to have a one > character byte array does not mean it takes advantage of the > resulting reduced space, and I have no mechanism for evaluating > it. I see some improvements to my code that I could use, but I > need to know if my mechanism is unsound or inefficient before I > adopt the paper's approach. I wonder if there is an update? > > Francis R. Harvey III > WB303, x3952 > harveyf1@westat.com > > VB programmers know the wisdom of Nothing > > > -----Original Message----- > > From: Kenneth Moody [mailto:KennethMoody@FIRSTHEALTH.COM] > > Sent: Monday, June 03, 2002 1:37 PM > > To: SAS-L@LISTSERV.VT.EDU > > Subject: Re: Bit arrays > > > > > > An excellent reference is Paul Dorfman's SUGI 26 paper, Table > > Look-Up by > > Direct Addressing: Key-Indexing -- Bitmapping -- Hashing. > > > > You can find it at: > > > > http://www2.sas.com/proceedings/sugi26/p008-26.pdf > > > > > > Ken Moody > > First Health, Metrics Department > > Voice: 916-374-3924 > > EMail: KennethMoody@firsthealth.com > <snip> > > > _________________________________________________________________ > Send and receive Hotmail on your mobile device: http://mobile.msn.com >


Back to: Top of message | Previous page | Main SAS-L page