LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2001, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 5 Jun 2001 10:49:03 -0400
Reply-To:     "Siegel, Jonathan" <Jonathan.Siegel@PFIZER.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Siegel, Jonathan" <Jonathan.Siegel@PFIZER.COM>
Subject:      Re: _temporary_ arrays vs "regular" arrays
Comments: To: "sashole@bellsouth.net" <sashole@bellsouth.net>
Content-Type: text/plain; charset="iso-8859-1"

Spoke way too soon. You're right!! A wonderful trick.

Jonathan Siegel

-----Original Message----- From: Paul Dorfman [mailto:paul_dorfman@hotmail.com] Sent: Tuesday, June 05, 2001 12:07 AM To: Jonathan.Siegel@PFIZER.COM; SAS-L@LISTSERV.UGA.EDU Subject: Re: _temporary_ arrays vs "regular" arrays

>From: Jonathan Siegel <Jonathan.Siegel@PFIZER.COM> > >While this code will work, a difficulty is that it requires disk i/o to >initialize the array. This turns out to be much less efficient than a DO >loop or other means.

Jonathan,

This proposition is nearly impossible to prove. The fact is, Lou's method is

a beatiful, clever trick, a real SAS code jem. As long as we are talking about arrays as variable lists with a name, i.e. 'real' arrays (not _temporary_ arrays), even without testing, it is easy to see that it will *always* out-perform initializing an array item-by-item. The first time the

instruction SET <FILE> POINT=P is issued, the entire record is simply moved en masse from the buffer to the memory (or PDV, if you prefer the term). The

same thing happens at each subsequent call. That is, the init values are simply moved from one area of memory to another as a group move (speaking COBOL); would not you think it is faster than moving the fields one by one? Let us turn to His Majesty Experiment for judgement. Imagine there is an (real) array with 1000 items, and it has to be re-initialized 100,000 times.

That is how Lou's technique runs:

53 %let n = 1000; 54 %let r = 1e5; 55 data init; array a (&n); retain a 1; run ; real time 0.01 seconds cpu time 0.01 seconds 56 data _null_ ; 57 array a (&n) ; 58 retain p 1 ; 59 do i=1 to &r ; 60 set init point = p ; 61 end; 62 stop ; 63 run; NOTE: There were 100000 observations read from the dataset WORK.INIT. NOTE: DATA statement used: real time 0.45 seconds cpu time 0.45 seconds

Why do I say that all the moves occur in memory? Because if I load the 1-record file using SASFILE, it does not change the things a bit! In comparison, let us examine how the item-by-item initialization fares:

64 data _null_ ; 65 array a (&n) ; 66 do i=1 to &r ; 67 do j=1 to &n; a(j) = 1; end ; 68 end; 69 run; NOTE: DATA statement used: real time 11.40 seconds cpu time 11.39 seconds

See the difference? The only method that can compete with that of Lou is one

where the moves are made from one location to another within PDV itself:

70 data _null_ ; 71 length ustr $ %eval(8*&n) ; 72 ustr = repeat(put(1,rb8.),&n-1) ; 73 array a (&n) ; 74 addr1 = addr(a1) ; 75 do i=1 to &r ; 76 call poke (ustr, addr1, 8*&n) ; 77 end; 78 run; NOTE: DATA statement used: real time 0.26 seconds cpu time 0.26 seconds

Above, a character string of 1000 8-byte double floats, each representing 1,

is moved to the array at its first address at once.

Both methods are limited, of course. Lou's technique, for apparent reasons, cannot initilaize more than 32767 array variables, and the CALL POKE technique is even more limited being limited to a string of 32767 characters

(that is, will fail as soon as the number if items has exceeded 4095 bytes).

The real limitation of Lou's method is its inability to initialize _temporary_ arrays. CALL POKE will still work, but only within the limits indicated above. Item-by-item priming, as shown above, is painfully slow. That is why SI contribution is warranted. All they have to do is code in the

underlying code an equivalent of A = 1 or what ever syntax SI would fancy to

choose.

>One of the purposes of using arrays is to make things more efficient. >This

>is especially likely for people who are using _temporary_ arrays. >For >these sorts of purposes, a SET POINT= method would probably be of >limited >use.

Precisely.

Kind regards, ======================== Paul M. Dorfman Jacksonville, Fl ========================

_________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com


Back to: Top of message | Previous page | Main SAS-L page