Date: Fri, 10 May 1996 02:45:29 +0100
Reply-To: John Whittington <johnw@MAG-NET.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: John Whittington <johnw@MAG-NET.CO.UK>
Subject: Re: Variable Initialisation Again!
On Wed, 8 May 1996, Andrew James Llwellyn Cary <ajlcary@IX.NETCOM.COM> wrote:
>Andrew Smith wrote:
>> On Wed, 1 May 1996, John Whittington wrote:
>>
>> > ... It seems that
>> > RETAIN only results in initialisation if an explicit initial value for the
>> > variable(s) is given (akin to having an assignment statement).
>>
>> Yes. Variables that aren't RETAINEd are set to missing at the start of
>> each iteration of the data step. RETAIN merely suppresses that, so that
>> variables continue to have whatever value they were last given until
>> something else gives them a new value.
>This is not true. The data values are reset to missing at the SET
>statement associated with the variable, and then filled from the data
>read by the SET.
Aha - there's actually no need for 'pistols at dawn' here, since Andrew and
Andrew are, I believe, both right. The only thing that the second Andrew
(Cary) has overlooked is all variables which come from a SET statement are
automaticaled 'RETAINed', just as if there had been a RETAIN statement
referencing them. Hence what Andrew (Cary) is describing the implicit
RETAIN functionality which comes from having a SET (or MERGE, I think)
statement.
I suppose it's really all semantics. SI, Andrew Smith and myself are saying
that data values are set to missing at the top of a DATA step unless
RETAINED (explicitly, or by coming from a SET etc.), whilst Andrew Cary is
saying the same thing without invoking the 'implied RETAIN' concept.
I suspect, however, that in terms of mechanics the 'implied RETAIN' concept
is the correct one. The default behaviour is to reset variables at the top
of the DATA step, and that only fails to happen if the variable is 'flagged'
in some 'look-up' area of memory as being one that is to be RETAINEed. I
therefore suspect that variables which are SET get flagged in that look-up,
just as if they were explicitly RETAINed.
It obviously is not 'any old SET statement' that will affect a particular
variable; if there are multiple SETs, they will only affect variables which
come from the dataset they are SETting. For example, if one has a dataset
ONE which has just one variable and one observation with say Y=5, and a
second dataset TWO with many observations but no Y variable, then the code:
data new ;
if _n_=1 then set one ;
set two ;
run ;
will result in variable Y in dataset NEW having Y=5 for each and every one
of the many observations - this being the implied RETAIN functionality at work.
I presume that Andrew (both, but mainly Cary) accepts that the value gets
set to missing at the top of the DATA step in the case of a variable which
does not come from a SET dataset - because that is certainly the case.
John
-----------------------------------------------------------
Dr John Whittington, Voice: +44 1296 730225
Mediscience Services Fax: +44 1296 738893
Twyford Manor, Twyford, E-mail: johnw@mag-net.co.uk
Buckingham MK18 4EL, UK CompuServe: 100517,3677
-----------------------------------------------------------