Date: Sat, 6 Nov 2010 10:46:48 -0700 "Nordlund, Dan (DSHS/RDA)" "SAS(r) Discussion" "Nordlund, Dan (DSHS/RDA)" Re: using data from two data sets <201011061412.oA6AkV31022857@willow.cc.uga.edu> text/plain; charset=utf-8

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Arthur Tabachneck > Sent: Saturday, November 06, 2010 7:13 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: using data from two data sets > > Dan (or anyone), > > One more explanation needed: why is sum_squares automatically retained > in > the following example? > > Art > > data set1; > input x; > datalines; > 4 > 6 > 8 > 2 > 3 > 6 > 8 > 2 > 1 > ; > run; > > data set2; > y=5; > run; > > data want (keep=mean sum_squares); > set temp2; > do i=1 to nobs; > set temp1 nobs=nobs; > sum_squares=sum(sum_squares,(x-mean)**2); > put _all_; > end; > run;

Art,

The variable sum_squares is not a retained variable in the above example. Non-retained variables are only set to missing at the top of the data step, not each time a SET statement is executed. Since only 1 full pass is made though the data step, and sum_squares is calculated and output before beginning a second iteration, its value is available. Again, add a PUT statement at the beginning of the data step and you will see that sum_squares is set to missing when the data step enters the second iteration (see _n_), just before it terminates.

data want (keep=mean sum_squares); put "beginning" _all_; set set2; do i=1 to nobs; set set1 nobs=nobs; sum_squares=sum(sum_squares,(x-mean)**2); put _all_; end; run;

As an additional note, that is one of the reasons the double DOW is so useful. You can read data in an internal DOW loop, and let the outer, implicit loop do the variable initialization for you. That reduces the need to RETAIN variables and re-initialize variables with a change in BY groups.