Date: Wed, 30 Nov 2005 12:06:41 -0800
Reply-To: "Nordlund, Dan" <NordlDJ@DSHS.WA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Nordlund, Dan" <NordlDJ@DSHS.WA.GOV>
Subject: Re: help with LAG function
Content-Type: text/plain; charset=utf-8
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> baogong jiang
> Sent: Wednesday, November 30, 2005 9:19 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: help with LAG function
>
> here is the codes that works for me:
>
>
> data new;
> set old;
> by id:
> lagserv = lag(serv_from_date);
> retain gap;
> if first.id then gap=.;
> else gap=input(put
> (serv_from_date,8.),yymmdd8.)-input(put(lagserv,8.),yymmdd8.);
> run;
>
> why the following not work?
>
> data new;
> set old;
> by id:
> * lagserv = lag(serv_from_date);
> retain gap; if first.id then gap=.;
> else gap=input(put
> (serv_from_date,8.),yymmdd8.)-input(put(lag(serv_from_date),8.),yymmdd8.);
> run;
>
>
>
> thank you very much,
> baogong
Baogong,
A common misunderstanding of the LAG function is that when it is used it
somehow retrieves the value from the immediately preceding record. However,
what actually happens is that each separate use of the LAG function sets up
a separate first-in/first-out queue (FIFO). For example, the statement
lagserve = lag(serv_from_date);
creates a FIFO (at compile time I believe, initially populated with missing
values) and each time it is executed at run time it retrieves the value at
the front of the queue and then places the current value of serv_from_date
at the end of the queue.
The reason the code above doesn't work is that the lag function is not
executed on first.id, so the value of serv_from_date for that record is not
placed in the cue. When you call the lag function on the second record for
the BY group, you will retrieve the value of serv_from_date that was the
current value the last time you executed the lag function (which, in your
case, will "usually" be from the last record of the previous BY group).
That is why it is "dangerous" to conditionally execute the LAG function.
One can get different size lags using LAGn where n is a number. For
example, lag2(var) will set up a FIFO of size 2. It then functions at run
time the same way as the lag function does (pull a value from the front of
the queue, place current value at the end), the queue is just longer.
As usual, if I have gotten anything wrong or misleading here, someone will
be along shortly to correct it.
Hope this is helpful,
Dan
Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA 98504-5204
|