LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 8 Nov 2005 22:31:58 -0500
Reply-To:     Richard Ristow <wrristow@MINDSPRING.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@MINDSPRING.COM>
Subject:      Re: An Alternative to LAG function
In-Reply-To:  <rfednUK_X9eRyPjeRVn-qQ@rogers.com>
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 06:19 PM 10/30/2005, Arthur Tabachneck wrote: >Toby (our favorite AI bot) responded off-line with the following: > >>Warren Sarle got ahold of Paul Dorfman and me on this subject and >>stated >>roughly that since SI is redoing the underlying code of the data step >>from >>the ground up they would be more than happy to include new >>functionality and >>improve th eold (such as the lag function). however, he stated that >>there >>needed to be a whole lot of input from the SAS Guru's to help them >>figure >>out these functions should work, for example how should the lag >>function >>work when you have multiple data sets being merged together?

I received Warren's inquiry, and responded to it, as well; see below. This is from a different point of view: addressing the meaning of "previous" given the complex input logic allowed in the DATA step. I explicitly compare with SPSS's logic.

>All we need is for somebody to explain EXACTLY what it ought to do >when there are multiple input SAS data sets, multiple input relational >database tables (Oracle, DB2, etc.), multiple output SAS data sets, >and multiple output relational database tables. > >Seriously. Can anybody help us?

OK, here's a naive answer to a subtle question:

LAG(X) should return the value of X from the immediately preceding record.

LAG2(X), or LAG(X,2) should return the value of X from the second preceding record, i.e. the record immediately preceding the immediately preceding record. (etc.)

Though naive, it focuses the question: find reasonable, unambiguous meanings for "record", "preceding", and "immediately". Here's an answer which I'm pretty sure are unambiguous, and which I'll argue is reasonable. It does depend on a processing model which the DATA step and the SPSS transformation program share:

The code of a SAS DATA step, or an SPSS transformation program, is the interior of a loop. (I've helped experienced programmers who flummexed because they weren't aware of that implicit loop.) Then, let "previous" be previous pass through the implicit loop. Operationalized,

LAG(X) is the value that X had at the end of the previous loop pass.

or, more precisely in SAS terms,

LAG(X) is the value that X had just before the previous clearing of the PDV.

Now, comments:

First, for however, much it matters, this LAG, like SPSS's, can only take a variable as argument (SAS's present one takes an arbitrary expression, I believe.)

Second, a classic, perhaps 'basic', use of the DATA step begins

DATA ...; SET ...; (or 'MERGE' or 'INPUT' in place of 'SET').

Here, the implicit loop has a very simple meaning: there's an 'engine' that produces a sequential file, one record per pass, and the DATA step implicit loop processes one record per pass. I'd say that in this case, the LAG I propose means exactly what you'd expect; and is, among other things, equivalent to SPSS's LAG. There's no confusion if the LAG is inside a conditional or other construct, either. And in the code

DATA FOO; SET BAR; BY BLORT; ... FIRST.BLORT=1 if and only if BLORT NE LAG(BLORT);

which I think is one test of "reasonable" for LAG. (This is SAS comparison, where "value NE <missing>" is 'true', rather than SPSS comparison, where "value NE <missing>" is 'missing'.)

In SPSS, this settles the matter, since a transformation program can only be used this way: it's begun by an 'engine' that produces a record at a time (GET FILE, ADD FILES, MATCH FILES, DATA LIST, or an INPUT PROGRAM).

In SAS, 'SET', 'MERGE', 'INPUT' are executables, and that raises a lot of complications. I'm sure that's what you were thinking of, when asking,

>All we need is [to know] what it ought to do when there are multiple >input SAS data sets, multiple input relational database tables >(Oracle, DB2, etc.), multiple output SAS data sets, and multiple >output relational database tables.

I don't think "multiple output" raises so many questions. I've done a lot with OUTPUT statements (they're one of my favorite SAS features), both to write many files or records, and to coalesce many records into one. But LAG, as I'm describing it, is an input construct only. At least in the DATA step, with 'basic' input as I've described it, I don't think multiple outputs would raise any difficulty. Nor would multiple inputs, as now with MERGE or multiple data-set SETs, as long as they resolved into an 'engine' for one record per call, and that 'engine' is called once per DATA step pass. That's why there's no problem in SPSS: what I'm describing is the only input logic allowed.

Goodness knows, you can do a lot of other things in SAS already. You can, for example, write your own "DATA step" loop within a single DATA step pass:

DO UNTIL END_IT = 1; SET ... <whatever you like>; OUTPUT; END; STOP;

And my "LAG" is useless; it returns '.' for any call.

With the "POINT" option, or anything similar, it's likely my LAG won't be what you want. But with "POINT", you're explicitly throwing away the notion of a "previous record". It could work well, though, as long as you do use the implicit loop. Here's chain following, rather nicely: NEXT_KEY is a variable in the input.

DATA ..... IF LAG(NEXT_KEY) = '.' GET_IT = 1. SET ... POINT=GET_IT.

(Interesting to try tree traversal this way. It probably wouldn't work well; tree traversal works best with a stack, or an array that simulates one.)


Back to: Top of message | Previous page | Main SAS-L page