LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 9 Nov 2005 08:36:36 -0500
Reply-To:   Jim Groeneveld <jim1stat@YAHOO.CO.UK>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Jim Groeneveld <jim1stat@YAHOO.CO.UK>
Subject:   Re: An Alternative to LAG function
Comments:   To: Richard Ristow <wrristow@MINDSPRING.COM>

Hi Richard,

If you want just simple LEAD functionality use the unconditional LAG with reversed order processing as in:

DATA One; INPUT Variable @@; CARDS; 0 1 2 3 4 5 6 7 8 9 10 ; RUN;

DATA Two (DROP=I); DO I = NObs TO 1 BY -1; SET One POINT=I NObs=NObs; * Add variables; LeadVar = LAG(Variable); * LAG here functions as LEAD; LeadSum + LAG(Variable); * Each LAG has own stack! ; OUTPUT; END; STOP; LABEL LeadSum = 'Sum _from behind_ of LeadVar values'; RUN;

DATA Three (DROP=I); DO I = NObs TO 1 BY -1; SET Two POINT=I NObs=NObs; * Reverse order instead of SORT; OUTPUT; END; STOP; RUN;

PROC PRINT DATA=Three; RUN;

The output is:

Lead Lead Obs Variable Var Sum

1 0 1 55 2 1 2 54 3 2 3 52 4 3 4 49 5 4 5 45 6 5 6 40 7 6 7 34 8 7 8 27 9 8 9 19 10 9 10 10 11 10 . 0

Regards - Jim. -- Y. (Jim) Groeneveld, MSc., Biostatistician, Vitatron b.v., NL Jim.Groeneveld_AT_Vitatron.com (replace _AT_ by AT sign) http://www.vitatron.com, http://home.hccnet.nl/jim.groeneveld

My computer always teaches me something new I thought I knew already.

[common disclaimer]

On Tue, 8 Nov 2005 22:31:58 -0500, Richard Ristow <wrristow@MINDSPRING.COM> wrote:

>At 06:19 PM 10/30/2005, Arthur Tabachneck wrote: >>Toby (our favorite AI bot) responded off-line with the following: >> >>>Warren Sarle got ahold of Paul Dorfman and me on this subject and >>>stated >>>roughly that since SI is redoing the underlying code of the data step >>>from >>>the ground up they would be more than happy to include new >>>functionality and >>>improve th eold (such as the lag function). however, he stated that >>>there >>>needed to be a whole lot of input from the SAS Guru's to help them >>>figure >>>out these functions should work, for example how should the lag >>>function >>>work when you have multiple data sets being merged together? > >I received Warren's inquiry, and responded to it, as well; see below. >This is from a different point of view: addressing the meaning of >"previous" given the complex input logic allowed in the DATA step. I >explicitly compare with SPSS's logic. > >>All we need is for somebody to explain EXACTLY what it ought to do >>when there are multiple input SAS data sets, multiple input relational >>database tables (Oracle, DB2, etc.), multiple output SAS data sets, >>and multiple output relational database tables. >> >>Seriously. Can anybody help us? > >OK, here's a naive answer to a subtle question: > >LAG(X) should return the value of X from the immediately preceding >record. > >LAG2(X), or LAG(X,2) should return the value of X from the second >preceding record, i.e. the record immediately preceding the immediately >preceding record. (etc.) > >Though naive, it focuses the question: find reasonable, unambiguous >meanings for "record", "preceding", and "immediately". Here's an answer >which I'm pretty sure are unambiguous, and which I'll argue is >reasonable. It does depend on a processing model which the DATA step >and the SPSS transformation program share: > >The code of a SAS DATA step, or an SPSS transformation program, is the >interior of a loop. (I've helped experienced programmers who flummexed >because they weren't aware of that implicit loop.) Then, let "previous" >be previous pass through the implicit loop. Operationalized, > >LAG(X) is the value that X had at the end of the previous loop pass. > >or, more precisely in SAS terms, > >LAG(X) is the value that X had just before the previous clearing of the >PDV. > >Now, comments: > >First, for however, much it matters, this LAG, like SPSS's, can only >take a variable as argument (SAS's present one takes an arbitrary >expression, I believe.) > >Second, a classic, perhaps 'basic', use of the DATA step begins > >DATA ...; > SET ...; >(or 'MERGE' or 'INPUT' in place of 'SET'). > >Here, the implicit loop has a very simple meaning: there's an 'engine' >that produces a sequential file, one record per pass, and the DATA step >implicit loop processes one record per pass. I'd say that in this case, >the LAG I propose means exactly what you'd expect; and is, among other >things, equivalent to SPSS's LAG. There's no confusion if the LAG is >inside a conditional or other construct, either. And in the code > >DATA FOO; > SET BAR; > BY BLORT; >... >FIRST.BLORT=1 >if and only if >BLORT NE LAG(BLORT); > >which I think is one test of "reasonable" for LAG. (This is SAS >comparison, where "value NE <missing>" is 'true', rather than SPSS >comparison, where "value NE <missing>" is 'missing'.) > >In SPSS, this settles the matter, since a transformation program can >only be used this way: it's begun by an 'engine' that produces a record >at a time (GET FILE, ADD FILES, MATCH FILES, DATA LIST, or an INPUT >PROGRAM). > >In SAS, 'SET', 'MERGE', 'INPUT' are executables, and that raises a lot >of complications. I'm sure that's what you were thinking of, when >asking, > >>All we need is [to know] what it ought to do when there are multiple >>input SAS data sets, multiple input relational database tables >>(Oracle, DB2, etc.), multiple output SAS data sets, and multiple >>output relational database tables. > >I don't think "multiple output" raises so many questions. I've done a >lot with OUTPUT statements (they're one of my favorite SAS features), >both to write many files or records, and to coalesce many records into >one. But LAG, as I'm describing it, is an input construct only. At >least in the DATA step, with 'basic' input as I've described it, I >don't think multiple outputs would raise any difficulty. Nor would >multiple inputs, as now with MERGE or multiple data-set SETs, as long >as they resolved into an 'engine' for one record per call, and that >'engine' is called once per DATA step pass. That's why there's no >problem in SPSS: what I'm describing is the only input logic allowed. > >Goodness knows, you can do a lot of other things in SAS already. You >can, for example, write your own "DATA step" loop within a single DATA >step pass: > >DO UNTIL END_IT = 1; > SET ... > <whatever you like>; > OUTPUT; >END; >STOP; > >And my "LAG" is useless; it returns '.' for any call. > >With the "POINT" option, or anything similar, it's likely my LAG won't >be what you want. But with "POINT", you're explicitly throwing away the >notion of a "previous record". It could work well, though, as long as >you do use the implicit loop. Here's chain following, rather nicely: >NEXT_KEY is a variable in the input. > >DATA ..... > IF LAG(NEXT_KEY) = '.' GET_IT = 1. > SET ... POINT=GET_IT. > >(Interesting to try tree traversal this way. It probably wouldn't work >well; tree traversal works best with a stack, or an array that >simulates one.)


Back to: Top of message | Previous page | Main SAS-L page