LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (September 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sun, 7 Sep 2008 16:20:17 -0400
Reply-To:   "Howard Schreier <hs AT dc-sug DOT org>" <schreier.junk.mail@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Howard Schreier <hs AT dc-sug DOT org>" <schreier.junk.mail@GMAIL.COM>
Subject:   Re: Table lookup puzzle

On Fri, 5 Sep 2008 18:18:42 GMT, Jay Weedon <jweedon@EARTHLINK.NET> wrote:

>On Fri, 05 Sep 2008 18:03:34 GMT, Jay Weedon <jweedon@earthlink.net> >wrote: > >>Hi folks, >> >>Guess I must be losing my grip of SAS programming, because I can't >>come up with an efficient solution to this easy-sounding problem. >> >>The master dataset A contains two date fields (the dates vary across >>records). The dataset to be consulted B contains one date field and >>one result field. The goal is for A to acquire 2 new fields: one >>representing the sum of all the results in B that occur within the >>window specified by the dates in A; the other is the number of >>observations in B from which that mean is computed. > >I meant "sum" rather than "mean" at the end there. Mean won't make >sense if no observations are found that lie in the window. > >Oh, and I forgot to mention that it needs to be a fairly efficient >solution because there are millions of records in both sets. > >JW

Even if the records in B span several decades, that's still only a few thousand distinct dates, so aggregating B to a summary with one count and one sum per day should shrink the problem by 2+ orders of magnitude. I would go further and transform the summary to a cumulative form. That way the processing for each observation in A will require accessing just 2 days' data from the summary (the closing date of the window and the day preceding the opening date of the window). I would construct the arrays so that the dates themselves serve as array indexes.


Back to: Top of message | Previous page | Main SAS-L page