Date: Tue, 19 Feb 2008 08:36:20 -0500
Reply-To: Nathaniel.Wooding@DOM.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Nat Wooding <Nathaniel.Wooding@DOM.COM>
Subject: Re: Why does retain work faster conditionally?
In-Reply-To: <200802191313.m1JBkpXb016893@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="ISO-8859-1"
Art
I don't know whether it is significant but in your two examples, you refer
to different input data sets. I suspect that this is merely a typo.
data want;
if _n_ eq 1 then do;
retain fname;
end;
set have;
run;
instead of:
data want;
retain fname;
set a;
run;
I ran a couple tests myself:
Cases one and three are essentially those that you ran but I do not read in
data sets -- I only run a loop.
74 data one;
75 retain x ;
76 do i=1 to 1000000; x=1; output;end;
77 run;
NOTE: The data set WORK.ONE has 1000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.20 seconds
cpu time 0.18 seconds
78
79 Data two;
80 do i = 1 to 1000000; retain x ;x=1;output;end;
81 run;
NOTE: The data set WORK.TWO has 1000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 5.50 seconds
cpu time 0.24 seconds
82
83 Data three;
84 if _n_ = 1 then do;
85 retain x;
86 end;
87 do i=1 to 1000000; x=1; output;end;
88 run;
NOTE: The data set WORK.THREE has 1000000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.27 seconds
cpu time 0.20 seconds
In my test, placing the retain within its own do statement penalizes us.
However, note especially case two in which the retain is placed in the do
that executes a million times. It is pretty clear that the retain is
checked at each iteration of the loop and, to my thinking, is executed each
time as opposed to being handled during the construction of the PDV like a
Drop or Keep statement.
Thanks for posting this -- I'm going to be a lot more careful about where I
stick retains.
Nat
Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977
Arthur Tabachneck
<art297@NETSCAPE.
NET> To
Sent by: "SAS(r) SAS-L@LISTSERV.UGA.EDU
Discussion" cc
<SAS-L@LISTSERV.U
GA.EDU> Subject
Why does retain work faster
conditionally?
02/19/2008 08:13
AM
Please respond to
Arthur Tabachneck
<art297@NETSCAPE.
NET>
One of our most respected list members wrote me off-line, asking why in
the world I would have suggested wrapping a retain statement within a
condition.
That is, given the following data:
data have;
input lname$ fname$;
do i=1 to 1000000;output;end;
cards;
lname1 fname1
lname2 fname2
;
why write:
data want;
if _n_ eq 1 then do;
retain fname;
end;
set have;
run;
instead of:
data want;
retain fname;
set a;
run;
I know why I provided the solution, because it had better performance, but
I could sure use some feedback explaining why that would be so.
I initially wrote it correctly and, upon seeing that it worked slower than
Jiann's SQL solution, tried to see if I could bypass reading the data
(i.e., when _n_ eq 0).
After I soon realized that wouldn't be possible, I ran the step as
presented.
Someone please explain to me why:
60 data want;
61 if _n_ eq 1 then do;
62 retain fname;
63 end;
64 set a;
65 run;
NOTE: There were 2000000 observations read from the data set WORK.A.
NOTE: The data set WORK.WANT has 2000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 1.12 seconds
cpu time 1.12 seconds
runs almost 50% faster than:
56 data want;
57 retain fname;
58 set a;
59 run;
NOTE: There were 2000000 observations read from the data set WORK.A.
NOTE: The data set WORK.WANT has 2000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 1.43 seconds
cpu time 1.43 seconds
I ran the tests on a 4-processor Window's 2003 system with 12 gig of ram
and SAS 9.1.3. It was during a holiday, thus I was the only one using the
computer and I re-ran the tests 3 times with the same results.
Art
--------
On Mon, 18 Feb 2008 23:21:23 -0500, Arthur Tabachneck
<art297@NETSCAPE.NET> wrote:
>Miguel,
>
>As Jiann indicated, you can do what you want with proc sql. However, you
>can also accomplish the same thing in a data step. For example,
>
>data have;
> input lname$ fname$;
> do i=1 to 1000000;output;end;
> cards;
> lname1 fname1
> lname2 fname2
> ;
>
>data want;
> if _n_ eq 1 then do;
> retain fname;
> end;
> set have;
>run;
>
>HTH,
>Art
>---------
>On Tue, 19 Feb 2008 02:55:04 +0000, Miguel de la Hoz <miguel_hoz@YAHOO.ES>
>wrote:
>
>>I am starting my problem with the following disposal of my dataset:
>
># variable
>1 lname
>2 fname
>
>I am trying to export it to excel but it is keeping that order. I would
>like to be able to write
>
># variable
>1 fname
>2 lname
>
>This is only an example my dataset contains around 20 fields.
>
>Thanks.
>
>MDH.
>
>
>
>______________________________________________
>¿Con Mascota por primera vez? Sé un mejor Amigo. Entra en Yahoo!
>Respuestas http://es.answers.yahoo.com/info/welcome
-----------------------------------------
CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.