LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2001, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 31 Aug 2001 16:41:32 -0400
Reply-To:   Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject:   Re: Automatically RETAINed ??
Comments:   To: Peetie Wheatstraw <peetie_wheatstraw@HOTMAIL.COM>
Content-Type:   text/plain; charset="iso-8859-1"

Subject: Automatically RETAINed ?? Summary: When can variable values change in a DATA step? Respondent: IanWhitlock@westat.com

Peetie Wheatstraw [peetie_wheatstraw@HOTMAIL.COM] asked a question about retaining variables from a SAS data set. There then followed a lively discussion. I did not participate before this, merely because I was busy. Moreover, I think the question was answered accurately by my colleague Quentin McMullen, but I would like to rephrase the situation. (Incidentally Quentin gave an excellent talk on handling missing values at the Westat SAS User's group meeting this morning, which is based on a poster presentation he will give at NESUG next month.)

Variables from SAS data sets are RETAINed. Now what does retain mean? SAS usually sets variables to missing at the top of each iteration of the implied DATA step loop. A variable is retained if this usual activity is not done. (Unfortunately much of the confusion shown in this discussion can be traced, I think, to poor documentation about RETAIN. The version 8 on-line documentation statement for RETAIN

>>>> Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next <<<<

is about as misleading as possible without being totally wrong. Note that the difficulty is that RETAIN cannot be explained on its own terms; it is only in understanding what standard DATA step processing is about that one can appreciate the full meaning of RETAIN.)

So how can values change for a variable from a SAS data set?

1) variables from a SAS data set are initialized once at the beginning of execution to missing. 2) if the variable comes from a SAS data set its value will change each time that data set is read. 3) if a user makes an assignment (or some explicit action to change the value) the value will change. 4) if the variable is in a data set participating in by-processing the variable will be set to missing at the beginning of each by-group. 5) if the variable comes from a SAS data set participating in a SET statement the value will set to missing every time the buffer is switched for that SET statement.

I think all of these points have been discussed and illustrated, but not necessarily in one place. I then offer the following code based in part on Peetie's original example, but extended in light of the above conditions.

data a; input id xa ; cards ; 1 1 3 1 3 2 ;

data b; input id xb ; cards ; 1 2 1 3 2 1 ;

data c ; xc = 66 ; run ;

data _null_ ; length id xa xb xc 8 ; put "At top: " _all_ ; if _n_ = 1 then set c ; set a ( in = a ) b ( in = b ) ; by id; put "After SET 1:" _all_; if _n_ = 2 then xa = 9 ; run;

Here is part of the log.

At top: id=. xa=. xb=. xc=. a=0 b=0 FIRST.id=1 LAST.id=1 _ERROR_=0 _N_=1 After SET 1:id=1 xa=1 xb=. xc=66 a=1 b=0 FIRST.id=1 LAST.id=0 _ERROR_=0 _N_=1 At top: id=1 xa=1 xb=. xc=66 a=1 b=0 FIRST.id=1 LAST.id=0 _ERROR_=0 _N_=2 After SET 1:id=1 xa=. xb=2 xc=66 a=0 b=1 FIRST.id=0 LAST.id=0 _ERROR_=0 _N_=2 At top: id=1 xa=9 xb=2 xc=66 a=0 b=1 FIRST.id=0 LAST.id=0 _ERROR_=0 _N_=3 After SET 1:id=1 xa=9 xb=3 xc=66 a=0 b=1 FIRST.id=0 LAST.id=1 _ERROR_=0 _N_=3 At top: id=1 xa=9 xb=3 xc=66 a=0 b=1 FIRST.id=0 LAST.id=1 _ERROR_=0 _N_=4 After SET 1:id=2 xa=. xb=1 xc=66 a=0 b=1 FIRST.id=1 LAST.id=1 _ERROR_=0 _N_=4 At top: id=2 xa=. xb=1 xc=66 a=0 b=1 FIRST.id=1 LAST.id=1 _ERROR_=0 _N_=5 After SET 1:id=3 xa=1 xb=. xc=66 a=1 b=0 FIRST.id=1 LAST.id=0 _ERROR_=0 _N_=5 At top: id=3 xa=1 xb=. xc=66 a=1 b=0 FIRST.id=1 LAST.id=0 _ERROR_=0 _N_=6 After SET 1:id=3 xa=2 xb=. xc=66 a=1 b=0 FIRST.id=0 LAST.id=1 _ERROR_=0 _N_=6 At top: id=3 xa=2 xb=. xc=66 a=1 b=0 FIRST.id=0 LAST.id=1 _ERROR_=0 _N_=7

I leave it to you to verify that every value is explained by one of the rules mentioned. It would also be good to run the program without the BY-statement since this causes some of the confusion. This does not prove the reasons are correct or that they are the only reasons. However, it should help convince one of these facts. If anyone can show a violation of an of these principles I would like to see the example.

Note that XC stays 66 after the first reading in spite of conditions 4 and 5 because they do not apply. Note that XA had the value 9 assigned during _N_ = 2 and stayed that way throughout _N_ = 3 because neither condition 4 nor 5 applied.

IanWhitlock@westat.com


Back to: Top of message | Previous page | Main SAS-L page