LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2008, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 19 Feb 2008 11:09:54 -0500
Reply-To:   Nathaniel.Wooding@DOM.COM
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Nat Wooding <Nathaniel.Wooding@DOM.COM>
Subject:   Re: Why does retain work faster conditionally?
Comments:   To: Paul Dorfman <sashole@BELLSOUTH.NET>
In-Reply-To:   <021920081552.955.47BAFB2D0006A291000003BB22230682329B0A02D2089B9A019C04040A0DBF0A0401089C0E9C@att.net>
Content-Type:   text/plain; charset="ISO-8859-1"

Paul

I ran a test earlier that seemed to show a difference in times depending on where the retain was and which eliminate I/O. After reading your posting, it occured to me that I was using a chunk of memory to store my first data set so I deleted the first set between runs and got the following results

193 data one; 194 retain x ; 195 do i=1 to 1000000; x=1; output;end; 196 run;

NOTE: The data set WORK.ONE has 1000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.19 seconds cpu time 0.19 seconds

197 Proc delete data = one; 198 run;

NOTE: Deleting WORK.ONE (memtype=DATA). NOTE: PROCEDURE DELETE used (Total process time): real time 0.00 seconds cpu time 0.00 seconds

199 Data two; 200 do i = 1 to 1000000; retain x ;x=1;output;end; 201 run;

NOTE: The data set WORK.TWO has 1000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.20 seconds cpu time 0.20 seconds

202 Proc delete data = two; 203 run;

NOTE: Deleting WORK.TWO (memtype=DATA). NOTE: PROCEDURE DELETE used (Total process time): real time 0.00 seconds cpu time 0.00 seconds

204 Data three; 205 if _n_ = 1 then do; 206 retain x; 207 end; 208 do i=1 to 1000000; x=1; output;end; 209 run;

NOTE: The data set WORK.THREE has 1000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.23 seconds cpu time 0.21 seconds

It looks like my earlier results were affected by memory usage.

Nat

Nat

Nat Wooding Environmental Specialist III Dominion, Environmental Biology 4111 Castlewood Rd Richmond, VA 23234 Phone:804-271-5313, Fax: 804-271-2977

Paul Dorfman <sashole@BELLSOUT H.NET> To Sent by: "SAS(r) SAS-L@LISTSERV.UGA.EDU Discussion" cc <SAS-L@LISTSERV.U GA.EDU> Subject Re: Why does retain work faster conditionally? 02/19/2008 10:52 AM

Please respond to Paul Dorfman <sashole@BELLSOUT H.NET>

Art,

I suspect that this difference in the run times is dictated by the external factors rather than the differences between the two DATA step versions. I have eliminated the output data set HAVE to reduce I/O background noise and repeated the test twice for consistency sake (under Windows XPro on a T61 ThinkPad as so):

514 data a ; 515 retain lname 'Galt' fname 'John' ; 516 do _n_ = 1 to 1e7 ; 517 output ; 518 end ; 519 run ; NOTE: The data set WORK.A has 10000000 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 6.56 seconds cpu time 2.57 seconds

520 data _null_ ; 521 retain fname; 522 set a; 523 run; NOTE: There were 10000000 observations read from the data set WORK.A. NOTE: DATA statement used (Total process time): real time 1.51 seconds cpu time 1.51 seconds

524 data _null_ ; 525 if _n_ eq 1 then do; 526 retain fname; 527 end; 528 set a; 529 run; NOTE: There were 10000000 observations read from the data set WORK.A. NOTE: DATA statement used (Total process time): real time 1.54 seconds cpu time 1.51 seconds

530 data _null_ ; 531 retain fname; 532 set a; 533 run; NOTE: There were 10000000 observations read from the data set WORK.A. NOTE: DATA statement used (Total process time): real time 1.48 seconds cpu time 1.48 seconds

534 data _null_ ; 535 if _n_ eq 1 then do; 536 retain fname; 537 end; 538 set a; 539 run; NOTE: There were 10000000 observations read from the data set WORK.A. NOTE: DATA statement used (Total process time): real time 1.54 seconds cpu time 1.54 seconds

However, even though the steps compared as I expected (i.e. executing a conditional statement 10 million times costs more than nothing) I would not draw the definite conclusion based on this comparison because the background input noise still mars the measurement.

The analogy I usually use in this sort of situation is that it is physically impossible to use a weigh station scale to weigh a fly by subtracting the weight of an elephant with the fly on its behind measured from the weight of the bare-ass elephant, for the difference will be inevitably dwarfed by the measurement errors. To weigh the fly, one needs to eliminate the elephant from the picture and weigh the fly (preferably not airborne) itself using a precision scale.

In this case, eliminating the elephant would mean:

602 data _null_ ; 603 lname = 'Galt' ; 604 fname = 'John' ; 605 do _n_ = 1 to 5e9 ; 606 retain fname ; 607 end ; 608 run ; NOTE: DATA statement used (Total process time): real time 1:17.40 cpu time 1:17.35

609 data _null_ ; 610 lname = 'Galt' ; 611 fname = 'John' ; 612 do _n_ = 1 to 5e9 ; 613 if _n_ = 1 then do ; 614 retain fname ; 615 end ; 616 end ; 617 run ; NOTE: DATA statement used (Total process time): real time 1:21.50 cpu time 1:21.23

Note SAS kis so blazingly fast in the execution of the conditional statement that I have been able to detect a measurable difference (and that is after eliminating all I/O!) by iterating the loops over a billion times. Iterating them 10 million times only has resulted in 0.15 seconds for each step, the difference being beyond the accuracy.

Of course, to my mind, all the measurements with RETAIN between IF and DO are a funny exercise not unlike an experiment I would stage to prove to myself that it is impossible to build a perpetuum mobile, because I know from the onset that at the run time, SAS simply does not see RETAIN (all its actions have been completed at the compile time beforehand). A good hint at the RETAIN not having been intended to be run conditionally is that the "instruction"

if _n_ = 1 then do retain fname ;

will not even compile -- a RETAIN statement must begin with the RETAIN keyword right after the preceding semicolon. That is why it compiles within the DO-END block, although at the run time SAS sees no difference whatsoever between

if _n_ = 1 then do ; retain fname ; end ;

and

if _n_ = 1 then do ; end ;

Kind regards ------------ Paul Dorfman Jax, FL ------------

-------------- Original message ---------------------- From: Arthur Tabachneck <art297@NETSCAPE.NET> > > One of our most respected list members wrote me off-line, asking why in > the world I would have suggested wrapping a retain statement within a > condition. > > That is, given the following data: > > data have; > input lname$ fname$; > do i=1 to 1000000;output;end; > cards; > lname1 fname1 > lname2 fname2 > ; > > why write: > > data want; > if _n_ eq 1 then do; > retain fname; > end; > set have; > run; > > instead of: > data want; > retain fname; > set a; > run; > > I know why I provided the solution, because it had better performance, but > I could sure use some feedback explaining why that would be so. > > I initially wrote it correctly and, upon seeing that it worked slower than > Jiann's SQL solution, tried to see if I could bypass reading the data > (i.e., when _n_ eq 0). > > After I soon realized that wouldn't be possible, I ran the step as > presented. > > Someone please explain to me why: > > 60 data want; > 61 if _n_ eq 1 then do; > 62 retain fname; > 63 end; > 64 set a; > 65 run; > > NOTE: There were 2000000 observations read from the data set WORK.A. > NOTE: The data set WORK.WANT has 2000000 observations and 3 variables. > NOTE: DATA statement used (Total process time): > real time 1.12 seconds > cpu time 1.12 seconds > > runs almost 50% faster than: > 56 data want; > 57 retain fname; > 58 set a; > 59 run; > > NOTE: There were 2000000 observations read from the data set WORK.A. > NOTE: The data set WORK.WANT has 2000000 observations and 3 variables. > NOTE: DATA statement used (Total process time): > real time 1.43 seconds > cpu time 1.43 seconds > > I ran the tests on a 4-processor Window's 2003 system with 12 gig of ram > and SAS 9.1.3. It was during a holiday, thus I was the only one using the > computer and I re-ran the tests 3 times with the same results. > > Art > -------- > On Mon, 18 Feb 2008 23:21:23 -0500, Arthur Tabachneck > <art297@NETSCAPE.NET> wrote: > > >Miguel, > > > >As Jiann indicated, you can do what you want with proc sql. However, you > >can also accomplish the same thing in a data step. For example, > > > >data have; > > input lname$ fname$; > > do i=1 to 1000000;output;end; > > cards; > > lname1 fname1 > > lname2 fname2 > > ; > > > >data want; > > if _n_ eq 1 then do; > > retain fname; > > end; > > set have; > >run; > > > >HTH, > >Art > >--------- > >On Tue, 19 Feb 2008 02:55:04 +0000, Miguel de la Hoz <miguel_hoz@YAHOO.ES> > >wrote: > > > >>I am starting my problem with the following disposal of my dataset: > > > ># variable > >1 lname > >2 fname > > > >I am trying to export it to excel but it is keeping that order. I would > >like to be able to write > > > ># variable > >1 fname > >2 lname > > > >This is only an example my dataset contains around 20 fields. > > > >Thanks. > > > >MDH. > > > > > > > >______________________________________________ > >¿Con Mascota por primera vez? Sé un mejor Amigo. Entra en Yahoo! > >Respuestas http://es.answers.yahoo.com/info/welcome

----------------------------------------- CONFIDENTIALITY NOTICE: This electronic message contains information which may be legally confidential and/or privileged and does not in any case represent a firm ENERGY COMMODITY bid or offer relating thereto which binds the sender without an additional express written confirmation to that effect. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.


Back to: Top of message | Previous page | Main SAS-L page