Date: Wed, 9 Nov 2005 08:36:36 -0500
Reply-To: Jim Groeneveld <jim1stat@YAHOO.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jim Groeneveld <jim1stat@YAHOO.CO.UK>
Subject: Re: An Alternative to LAG function
Hi Richard,
If you want just simple LEAD functionality use the unconditional LAG with
reversed order processing as in:
DATA One;
INPUT Variable @@;
CARDS;
0 1 2 3 4 5 6 7 8 9 10
;
RUN;
DATA Two (DROP=I);
DO I = NObs TO 1 BY -1;
SET One POINT=I NObs=NObs; * Add variables;
LeadVar = LAG(Variable); * LAG here functions as LEAD;
LeadSum + LAG(Variable); * Each LAG has own stack! ;
OUTPUT;
END;
STOP;
LABEL LeadSum = 'Sum _from behind_ of LeadVar values';
RUN;
DATA Three (DROP=I);
DO I = NObs TO 1 BY -1;
SET Two POINT=I NObs=NObs; * Reverse order instead of SORT;
OUTPUT;
END;
STOP;
RUN;
PROC PRINT DATA=Three;
RUN;
The output is:
Lead Lead
Obs Variable Var Sum
1 0 1 55
2 1 2 54
3 2 3 52
4 3 4 49
5 4 5 45
6 5 6 40
7 6 7 34
8 7 8 27
9 8 9 19
10 9 10 10
11 10 . 0
Regards - Jim.
--
Y. (Jim) Groeneveld, MSc., Biostatistician, Vitatron b.v., NL
Jim.Groeneveld_AT_Vitatron.com (replace _AT_ by AT sign)
http://www.vitatron.com, http://home.hccnet.nl/jim.groeneveld
My computer always teaches me something new I thought I knew already.
[common disclaimer]
On Tue, 8 Nov 2005 22:31:58 -0500, Richard Ristow <wrristow@MINDSPRING.COM>
wrote:
>At 06:19 PM 10/30/2005, Arthur Tabachneck wrote:
>>Toby (our favorite AI bot) responded off-line with the following:
>>
>>>Warren Sarle got ahold of Paul Dorfman and me on this subject and
>>>stated
>>>roughly that since SI is redoing the underlying code of the data step
>>>from
>>>the ground up they would be more than happy to include new
>>>functionality and
>>>improve th eold (such as the lag function). however, he stated that
>>>there
>>>needed to be a whole lot of input from the SAS Guru's to help them
>>>figure
>>>out these functions should work, for example how should the lag
>>>function
>>>work when you have multiple data sets being merged together?
>
>I received Warren's inquiry, and responded to it, as well; see below.
>This is from a different point of view: addressing the meaning of
>"previous" given the complex input logic allowed in the DATA step. I
>explicitly compare with SPSS's logic.
>
>>All we need is for somebody to explain EXACTLY what it ought to do
>>when there are multiple input SAS data sets, multiple input relational
>>database tables (Oracle, DB2, etc.), multiple output SAS data sets,
>>and multiple output relational database tables.
>>
>>Seriously. Can anybody help us?
>
>OK, here's a naive answer to a subtle question:
>
>LAG(X) should return the value of X from the immediately preceding
>record.
>
>LAG2(X), or LAG(X,2) should return the value of X from the second
>preceding record, i.e. the record immediately preceding the immediately
>preceding record. (etc.)
>
>Though naive, it focuses the question: find reasonable, unambiguous
>meanings for "record", "preceding", and "immediately". Here's an answer
>which I'm pretty sure are unambiguous, and which I'll argue is
>reasonable. It does depend on a processing model which the DATA step
>and the SPSS transformation program share:
>
>The code of a SAS DATA step, or an SPSS transformation program, is the
>interior of a loop. (I've helped experienced programmers who flummexed
>because they weren't aware of that implicit loop.) Then, let "previous"
>be previous pass through the implicit loop. Operationalized,
>
>LAG(X) is the value that X had at the end of the previous loop pass.
>
>or, more precisely in SAS terms,
>
>LAG(X) is the value that X had just before the previous clearing of the
>PDV.
>
>Now, comments:
>
>First, for however, much it matters, this LAG, like SPSS's, can only
>take a variable as argument (SAS's present one takes an arbitrary
>expression, I believe.)
>
>Second, a classic, perhaps 'basic', use of the DATA step begins
>
>DATA ...;
> SET ...;
>(or 'MERGE' or 'INPUT' in place of 'SET').
>
>Here, the implicit loop has a very simple meaning: there's an 'engine'
>that produces a sequential file, one record per pass, and the DATA step
>implicit loop processes one record per pass. I'd say that in this case,
>the LAG I propose means exactly what you'd expect; and is, among other
>things, equivalent to SPSS's LAG. There's no confusion if the LAG is
>inside a conditional or other construct, either. And in the code
>
>DATA FOO;
> SET BAR;
> BY BLORT;
>...
>FIRST.BLORT=1
>if and only if
>BLORT NE LAG(BLORT);
>
>which I think is one test of "reasonable" for LAG. (This is SAS
>comparison, where "value NE <missing>" is 'true', rather than SPSS
>comparison, where "value NE <missing>" is 'missing'.)
>
>In SPSS, this settles the matter, since a transformation program can
>only be used this way: it's begun by an 'engine' that produces a record
>at a time (GET FILE, ADD FILES, MATCH FILES, DATA LIST, or an INPUT
>PROGRAM).
>
>In SAS, 'SET', 'MERGE', 'INPUT' are executables, and that raises a lot
>of complications. I'm sure that's what you were thinking of, when
>asking,
>
>>All we need is [to know] what it ought to do when there are multiple
>>input SAS data sets, multiple input relational database tables
>>(Oracle, DB2, etc.), multiple output SAS data sets, and multiple
>>output relational database tables.
>
>I don't think "multiple output" raises so many questions. I've done a
>lot with OUTPUT statements (they're one of my favorite SAS features),
>both to write many files or records, and to coalesce many records into
>one. But LAG, as I'm describing it, is an input construct only. At
>least in the DATA step, with 'basic' input as I've described it, I
>don't think multiple outputs would raise any difficulty. Nor would
>multiple inputs, as now with MERGE or multiple data-set SETs, as long
>as they resolved into an 'engine' for one record per call, and that
>'engine' is called once per DATA step pass. That's why there's no
>problem in SPSS: what I'm describing is the only input logic allowed.
>
>Goodness knows, you can do a lot of other things in SAS already. You
>can, for example, write your own "DATA step" loop within a single DATA
>step pass:
>
>DO UNTIL END_IT = 1;
> SET ...
> <whatever you like>;
> OUTPUT;
>END;
>STOP;
>
>And my "LAG" is useless; it returns '.' for any call.
>
>With the "POINT" option, or anything similar, it's likely my LAG won't
>be what you want. But with "POINT", you're explicitly throwing away the
>notion of a "previous record". It could work well, though, as long as
>you do use the implicit loop. Here's chain following, rather nicely:
>NEXT_KEY is a variable in the input.
>
>DATA .....
> IF LAG(NEXT_KEY) = '.' GET_IT = 1.
> SET ... POINT=GET_IT.
>
>(Interesting to try tree traversal this way. It probably wouldn't work
>well; tree traversal works best with a stack, or an array that
>simulates one.)
|