Date: Thu, 11 Mar 2004 11:57:54 +0100
Reply-To: "Groeneveld, Jim" <jim.groeneveld@VITATRON.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Groeneveld, Jim" <jim.groeneveld@VITATRON.COM>
Subject: FYI: pittfalls of data step loop
Content-Type: text/plain; charset="iso-8859-1"
Yesterday I was generating random data (normal distribution with mean 10 and a larger sd than 1). I wanted to obtain only the data values between 0 and 20. I used the following code, including a data step loop.
%LET N = 1000;
%LET RandDist = RANNOR;
TITLE "Generation of random data; RandDist=&RandDist, N=&N";
DATA SkewDist (DROP=_I_);
DO _I_ = 1 TO &N;
X = &RandDist(1);
* x; X = X + 10;
* s; X = (X - 10) * 5 + 10;
IF (X GE 0 AND X LE 20); * subsetting IF;
PROC UNIVARIATE DATA=SkewDist; VAR X; HISTOGRAM X / MIDPOINTS=0 TO 20 BY .1; RUN;
This quite unexpectedly did not the results I wanted! Only a few of the generated values seemed to remain, by far not all values between 0 and 20. I changed the subsetting IF into:
IF (X LT 0 OR X GT 20) THEN DELETE;
Neither that did produce the expected results. At first I had difficulty in trying to find the cause of this phenomenon, but having had too many such experiences with SAS, and knowing myself, I began doubting myself. And of course, once you know and understand it the explanation is quite simple: both the subsetting IF and the DELETE stop processing further statements (of course without outputting anything) and restart the data step from the beginning for the next observation, which is not what was intended. There is no next observation, thus after the first deletion the data step finishes.
Only by combining the condition and the OUTPUT statement into:
IF (X GE 0 AND X LE 20) THEN OUTPUT;
the program ran as intended and correctly.
Regards - Jim.
. . . . . . . . . . . . . . . .
Jim Groeneveld, MSc.
6825 MJ Arnhem
Tel: +31/0 26 376 7365
Fax: +31/0 26 376 7305
My computer remains home, but I will attend SUGI 2004.