Date: Tue, 6 Mar 2007 14:14:35 -0500
Reply-To: Jack Clark <JClark@CHPDM.UMBC.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jack Clark <JClark@CHPDM.UMBC.EDU>
Subject: Re: Problem combining LAG with a BY variable
Content-Type: text/plain
Paul,
After reviewing the SAS online doc more carefully, I saw that it gave a good
example of how to work with your type of problem.
Try something like this...
data test;
infile cards missover;
input site_id $ stop_date min_stop max_stop haart;
cards;
AA 10 11 12 1
AA 13 14 15 1
AA 16 17 18 0
BB 20 21 22 0
BB 23 24 25 1
BB 26 27 28 1
CC 30 31 32 1
DD 40 41 42 1
DD 43 44 45 0
DD 46 47 48 1
DD 49 50 51 1
;
run;
proc sort data = test;
by site_id;
run;
data new;
set test (where=(haart=1));
by site_id;
LAG_STOP = LAG(STOP_DATE);
LAG_MIN_STOP = LAG(MIN_STOP);
LAG_MAX_STOP = LAG(MAX_STOP);
if first.site_id then do;
lag_stop = .;
lag_min_stop = .;
lag_max_stop = .;
end;
run;
It seems that you have to assign the values of your lag variables to missing
after you calculate them (for the first observation of each ID).
Let us know if this works for you.
Jack Clark
Research Analyst
Center for Health Program Development and Management
University of Maryland, Baltimore County
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Paul
Miller
Sent: Tuesday, March 06, 2007 1:53 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Problem combining LAG with a BY variable
Hi Jack,
SITE_ID is my id variable. And my data are sorted by SITE_ID. I've tried
syntax like what you recommend but it's not working. I understand that
lag retains the value that it had the last time it executed. So
presumably it will retain this value even if I set the lag variables = .
for the first site_id?
In other words, won't it say "Sure, you set these values to missing. But
I still have the value from the last time I executed. And the next time
I execute I'm going to write those values, even if they're from a
previous site_id"?
At least this is what seems to be happening.
Thanks,
Paul
Paul J. Miller, Ph.D.
Research Scientist and Statistician
Ontario HIV Treatment Network
1300 Yonge St., Suite 308
Toronto, Ontario M4T 1X3
Phone: (416) 642-6486 ext 232
Fax: (416) 640-4245
-----Original Message-----
From: Jack Clark [mailto:JClark@chpdm.umbc.edu]
Sent: March 6, 2007 1:46 PM
To: Paul Miller; SAS-L@LISTSERV.UGA.EDU
Subject: RE: Problem combining LAG with a BY variable
Paul,
You mention patient ID in your post, but I only see SITE_ID in your
code.
Is SITE_ID the variable for patient ID?
If so...
You could initialize the 3 lag variables to missing each time you start
a
new value of SITE_ID...
if first.site_id then do;
lag_stop = .;
lag_min_stop = .;
lag_max_stop = .;
end;
else do;
LAG_STOP = LAG(STOP_DATE);
LAG_MIN_STOP = LAG(MIN_STOP);
LAG_MAX_STOP = LAG(MAX_STOP);
end;
Keep in mind that the use of first.site_id relies on the input data set
being sorted by SITE_ID and the dataset being SET BY SITE_ID (as it was
in
your code).
Jack Clark
Research Analyst
Center for Health Program Development and Management
University of Maryland, Baltimore County
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Paul
Miller
Sent: Tuesday, March 06, 2007 1:15 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Problem combining LAG with a BY variable
Hi Everyone,
I'm having what I suspect will be an easy problem for many. I want to
calculate a set of lag variables within each value of a patient id. My
syntax appears below. I'm having trouble with the three lag variables
LAG_STOP, LAG_MIN_STOP, LAG_MAX_STOP. Currently, the lag function is
carrying values from one id forward into the next instead of doing the
lag within each value of id. What's the best way to fix this?
Thanks,
Paul
DATA HAART_INTERRUPT(DROP=EVENT_ID DURATION--HAART);
RETAIN SITE_ID START_DATE STOP_DATE
MIN_START MAX_START MID_START MIN_STOP
MAX_STOP MID_STOP;
SET REGIMENS (WHERE=(HAART=1));
BY SITE_ID;
FORMAT START_DATE--MID_STOP MMDDYY8.;
MID_START = SUM(MIN_START,MAX_START)/2;
MID_STOP = SUM(MIN_STOP,MAX_STOP)/2;
LAG_STOP = LAG(STOP_DATE);
LAG_MIN_STOP = LAG(MIN_STOP);
LAG_MAX_STOP = LAG(MAX_STOP);
IF START_DATE NE . AND LAG_STOP NE . THEN DO;
STOP_DURATION = START_DATE -
LAG_STOP;
MIN_STOP_DURATION = STOP_DURATION;
MAX_STOP_DURATION = STOP_DURATION;
END;
IF START_DATE NE . AND LAG_STOP = . THEN DO;
STOP_DURATION = .;
MIN_STOP_DURATION = START_DATE -
LAG_MAX_STOP;
MAX_STOP_DURATION = START_DATE -
LAG_MIN_STOP;
END;
IF START_DATE = . AND LAG_STOP NE . THEN DO;
STOP_DURATION = .;
MIN_STOP_DURATION = MIN_START -
LAG_STOP;
MAX_STOP_DURATION = MAX_START -
LAG_STOP;
END;
IF START_DATE = . AND LAG_STOP = . THEN DO;
STOP_DURATION = .;
MIN_STOP_DURATION = MIN_START -
LAG_MAX_STOP;
MAX_STOP_DURATION = MAX_START -
LAG_MIN_STOP;
END;
RUN;