Date: Wed, 19 May 2004 13:06:53 -0400
Reply-To: "Chang Y. Chung" <chang_y_chung@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Chang Y. Chung" <chang_y_chung@HOTMAIL.COM>
Subject: Re: Sliding window self-join
On Tue, 18 May 2004 17:00:21 -0700, Bruce Bradbury <BruceBrad@INAME.COM>
wrote:
>I'm starting with a file like this (sorted by person then date)
>
>Person Date Var
>A 1 v1
>A 2 v2
>A 3 v3
>A 4 v4
>A 5 v5
>B 3 v6
>B 4 v7
>B 5 v8
>B 6 v9
>
>I want to end up with a file like the following. This has 3 records
>for each record in the starting file. The first record for each
>person/date combination includes the value of var for the previous
>period, the second record the current value and the third record the
>next value. (In practice I have a wider window with about 50 records
>for each person/date combination - but this gives you the idea).
>
>Person Date RelativeDate Var
>A 1 -1 .
>A 1 0 v1
>A 1 1 v2
>A 2 -1 v1
>A 2 0 v2
>A 2 1 v3
>[more person A records]
>B 3 -1 .
>B 3 0 v6
>etc
>
>I know I could do this with lags, but this seems messy. Is there a way
>of doing this using a data step merge of the file with itself (or via
>sql)? I need an efficient approach as the real file is rather large.
>If necessary, I can create indexes for the input data file.
Hi, Bruce,
Is this too messy? This approach is tolerant of skipped or non-consecutive
dates. It would be messy if you have many Var's. And you should set the
macro var max before hand. But, it should run fast.
Cheers,
Chang
data one;
input Person $ Date Var $;
cards;
A 1 v1
A 2 v2
B 6 v6
run;
proc print data=one;
run;
%let max = 1000; /* should be larger than max number
of dates for any person */
%let half_window = 1; /* -1 0 1 */
data two;
array dateArr[1:&max.] _temporary_;
array varArr [1:&max.] $ _temporary_;
set one;
by person date notsorted;
if first.Person then do;
idx = 0;
end;
idx + 1;
dateArr[idx] = date;
varArr[idx] = var;
if last.person then do i = 1 to idx;
Date = dateArr[i];
do RelativeDate = -&half_window. to &half_window.;
w = i + RelativeDate;
if w < 1 or idx < w
then Var = ".";
else Var = varArr[w];
output;
keep person date relativeDate var;
end;
end;
run;
proc print data=two;
run;
/* on lst
Relative
Obs Person Date Var Date
1 A 1 . -1
2 A 1 v1 0
3 A 1 v2 1
4 A 2 v1 -1
5 A 2 v2 0
6 A 2 . 1
7 B 6 . -1
8 B 6 v6 0
9 B 6 . 1
*/