```Date: Wed, 9 Nov 2005 08:36:36 -0500 Reply-To: Jim Groeneveld Sender: "SAS(r) Discussion" From: Jim Groeneveld Subject: Re: An Alternative to LAG function Comments: To: Richard Ristow Hi Richard, If you want just simple LEAD functionality use the unconditional LAG with reversed order processing as in: DATA One; INPUT Variable @@; CARDS; 0 1 2 3 4 5 6 7 8 9 10 ; RUN; DATA Two (DROP=I); DO I = NObs TO 1 BY -1; SET One POINT=I NObs=NObs; * Add variables; LeadVar = LAG(Variable); * LAG here functions as LEAD; LeadSum + LAG(Variable); * Each LAG has own stack! ; OUTPUT; END; STOP; LABEL LeadSum = 'Sum _from behind_ of LeadVar values'; RUN; DATA Three (DROP=I); DO I = NObs TO 1 BY -1; SET Two POINT=I NObs=NObs; * Reverse order instead of SORT; OUTPUT; END; STOP; RUN; PROC PRINT DATA=Three; RUN; The output is: Lead Lead Obs Variable Var Sum 1 0 1 55 2 1 2 54 3 2 3 52 4 3 4 49 5 4 5 45 6 5 6 40 7 6 7 34 8 7 8 27 9 8 9 19 10 9 10 10 11 10 . 0 Regards - Jim. -- Y. (Jim) Groeneveld, MSc., Biostatistician, Vitatron b.v., NL Jim.Groeneveld_AT_Vitatron.com (replace _AT_ by AT sign) http://www.vitatron.com, http://home.hccnet.nl/jim.groeneveld My computer always teaches me something new I thought I knew already. [common disclaimer] On Tue, 8 Nov 2005 22:31:58 -0500, Richard Ristow wrote: >At 06:19 PM 10/30/2005, Arthur Tabachneck wrote: >>Toby (our favorite AI bot) responded off-line with the following: >> >>>Warren Sarle got ahold of Paul Dorfman and me on this subject and >>>stated >>>roughly that since SI is redoing the underlying code of the data step >>>from >>>the ground up they would be more than happy to include new >>>functionality and >>>improve th eold (such as the lag function). however, he stated that >>>there >>>needed to be a whole lot of input from the SAS Guru's to help them >>>figure >>>out these functions should work, for example how should the lag >>>function >>>work when you have multiple data sets being merged together? > >I received Warren's inquiry, and responded to it, as well; see below. >This is from a different point of view: addressing the meaning of >"previous" given the complex input logic allowed in the DATA step. I >explicitly compare with SPSS's logic. > >>All we need is for somebody to explain EXACTLY what it ought to do >>when there are multiple input SAS data sets, multiple input relational >>database tables (Oracle, DB2, etc.), multiple output SAS data sets, >>and multiple output relational database tables. >> >>Seriously. Can anybody help us? > >OK, here's a naive answer to a subtle question: > >LAG(X) should return the value of X from the immediately preceding >record. > >LAG2(X), or LAG(X,2) should return the value of X from the second >preceding record, i.e. the record immediately preceding the immediately >preceding record. (etc.) > >Though naive, it focuses the question: find reasonable, unambiguous >meanings for "record", "preceding", and "immediately". Here's an answer >which I'm pretty sure are unambiguous, and which I'll argue is >reasonable. It does depend on a processing model which the DATA step >and the SPSS transformation program share: > >The code of a SAS DATA step, or an SPSS transformation program, is the >interior of a loop. (I've helped experienced programmers who flummexed >because they weren't aware of that implicit loop.) Then, let "previous" >be previous pass through the implicit loop. Operationalized, > >LAG(X) is the value that X had at the end of the previous loop pass. > >or, more precisely in SAS terms, > >LAG(X) is the value that X had just before the previous clearing of the >PDV. > >Now, comments: > >First, for however, much it matters, this LAG, like SPSS's, can only >take a variable as argument (SAS's present one takes an arbitrary >expression, I believe.) > >Second, a classic, perhaps 'basic', use of the DATA step begins > >DATA ...; > SET ...; >(or 'MERGE' or 'INPUT' in place of 'SET'). > >Here, the implicit loop has a very simple meaning: there's an 'engine' >that produces a sequential file, one record per pass, and the DATA step >implicit loop processes one record per pass. I'd say that in this case, >the LAG I propose means exactly what you'd expect; and is, among other >things, equivalent to SPSS's LAG. There's no confusion if the LAG is >inside a conditional or other construct, either. And in the code > >DATA FOO; > SET BAR; > BY BLORT; >... >FIRST.BLORT=1 >if and only if >BLORT NE LAG(BLORT); > >which I think is one test of "reasonable" for LAG. (This is SAS >comparison, where "value NE " is 'true', rather than SPSS >comparison, where "value NE " is 'missing'.) > >In SPSS, this settles the matter, since a transformation program can >only be used this way: it's begun by an 'engine' that produces a record >at a time (GET FILE, ADD FILES, MATCH FILES, DATA LIST, or an INPUT >PROGRAM). > >In SAS, 'SET', 'MERGE', 'INPUT' are executables, and that raises a lot >of complications. I'm sure that's what you were thinking of, when >asking, > >>All we need is [to know] what it ought to do when there are multiple >>input SAS data sets, multiple input relational database tables >>(Oracle, DB2, etc.), multiple output SAS data sets, and multiple >>output relational database tables. > >I don't think "multiple output" raises so many questions. I've done a >lot with OUTPUT statements (they're one of my favorite SAS features), >both to write many files or records, and to coalesce many records into >one. But LAG, as I'm describing it, is an input construct only. At >least in the DATA step, with 'basic' input as I've described it, I >don't think multiple outputs would raise any difficulty. Nor would >multiple inputs, as now with MERGE or multiple data-set SETs, as long >as they resolved into an 'engine' for one record per call, and that >'engine' is called once per DATA step pass. That's why there's no >problem in SPSS: what I'm describing is the only input logic allowed. > >Goodness knows, you can do a lot of other things in SAS already. You >can, for example, write your own "DATA step" loop within a single DATA >step pass: > >DO UNTIL END_IT = 1; > SET ... > ; > OUTPUT; >END; >STOP; > >And my "LAG" is useless; it returns '.' for any call. > >With the "POINT" option, or anything similar, it's likely my LAG won't >be what you want. But with "POINT", you're explicitly throwing away the >notion of a "previous record". It could work well, though, as long as >you do use the implicit loop. Here's chain following, rather nicely: >NEXT_KEY is a variable in the input. > >DATA ..... > IF LAG(NEXT_KEY) = '.' GET_IT = 1. > SET ... POINT=GET_IT. > >(Interesting to try tree traversal this way. It probably wouldn't work >well; tree traversal works best with a stack, or an array that >simulates one.) ```

Back to: Top of message | Previous page | Main SAS-L page