Date: Mon, 15 Aug 2011 18:10:44 -0400 "Kirby, Ted" "SAS(r) Discussion" "Kirby, Ted" Timing of lag() function text/plain; charset="us-ascii"

With the following dataset:

data coverage3;

input individual_id :\$8. Eff_Date :date9. end_date :date9. Cust_ID :\$9. count_index;

format Eff_Date end_date date9.;

datalines;

39030981 01Jan2009 30Apr2009 000192961 1

39030981 01May2009 31May2009 000192961 2

39030981 01Jun2009 30Sep2009 000192961 3

39030981 01Oct2009 31Dec2009 000192961 4

39121557 10Oct2008 30Nov2008 000189496 1

;

run;

and the following code:

proc sort data=coverage3; by individual_id Eff_date; run;

/* The data are sorted in the INPUT data, but run the PROC SORT so that SAS knows it is sorted and we can use the BY statement below. */

data coverage3_eff;

set coverage3;

by individual_id;

x = lag(eff_date);

y = lag(end_date);

z = lag(cust_id);

if first.individual_id then new_eff_date = eff_date;

else do;

w = lag(new_eff_date);

if eff_date - y >= 90 then new_eff_date = eff_date;

if eff_date - y < 90 and cust_id ^= z then new_eff_date = eff_date;

if eff_date - y < 90 and cust_id = z and count_index <= 2 then new_eff_date = x;

if eff_date - y < 90 and cust_id = z and count_index > 2 then new_eff_date = w;

end;

format new_eff_date x y w date9.;

run;

Why is the variable "w" missing for all observations? The "new_eff_date" variable was assigned a value with the first run through the data statement (with the "if first. Individual_id . . . " statement), so I would have thought that subsequent observations would have had a value for "w" (especially the 2nd observation).

This happens even if "w" is defined outside of the conditional IF in the same block of code as the variables "x" "y" and "z" are defined.

If I add a "RETAIN new_eff_date;" statement to the code above then "w" has a value for the 3rd and 4th observations, but not the 2nd or 5th observation. This is fine for the 5th observation, since it is the beginning of the new "individual_id" block within the data. However, I want there to be a value for "w" in the 2nd observation.

In all the variations of the code above, all of the "lag" variables "x" "y" and "z" have non-missing values. Only "w" has missing values. How can I get the "w" in the 2nd observation to have the value of the "new_eff_date" from the first observation?

This e-mail, including attachments, may include confidential and/or proprietary information,

and may be used only by the person or entity to which it is addressed. If the reader of this

e-mail is not the intended recipient or his or her authorized agent, the reader is hereby

notified that any dissemination, distribution or copying of this e-mail is prohibited. If you