Date: Mon, 26 Aug 2002 18:59:28 -0400
Reply-To: Sigurd Hermansen <hermans1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <hermans1@WESTAT.COM>
Subject: Re: Efficiency advice for nodups
It occurs to me that the current table of most recent entries consists of
the last day's most recent entries updated by the daily dataset. A SQL full
join of the daily dataset to the prior dataset of most recent entries would
produce three subsets: the current day entries not on the prior most recent
dataset, daily entries matching prior most recent entries, and prior most
recent entries that do not match daily entries. In the last subset, the
daily entries will be missing. COALESCE() functions pairing daily entries
and prior most recent entries should select the data items that you want.
Sig
On Mon, 26 Aug 2002 13:41:18 -0400, Goldman, Brad (AT-Atlanta)
<Brad.Goldman@AUTOTRADER.COM> wrote:
>Hi all,
>
>Looking for advice on the best way to accomplish this task. Each day I
>create a dataset of ~3 million rows for some car data. I need to create an
>month-to-date dataset that has the most recent entry among all those
>datasets for each car.
>
>Currently, I do:
>1) get the daily dataset (one of the fields has todays date, dt_rec).
>2) append it to yesterday's mtd dataset. Wish I could "prepend" here!
>3) sort by car, descending dt_rec.
>4) sort nodupkey by car.
>
>The double sort seems needed because the nodup option returns the first
>occurence of the key, whereas I want it to take the last occurence.
>
>Any advice on ways to improve this? Should I replace 2-4 with a Proc SQL
>having max(dt_rec) sort of thing?
>
>Thanks,
>Brad
|