Date: Mon, 7 Oct 1996 09:58:26 PDT
Reply-To: TWB2%Rates%FAR@GO50.COMP.PGE.COM
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: TWB2%Rates%FAR@GO50.COMP.PGE.COM
Subject: Re: Indexing annoyance
Jay, I'll propose a tolerable approach:
DATA NEWA/VIEW=NEWA;
SET A;
BY ID DATE2;
RUN;
DATA FIRST;
MERGE NEWA(IN=A)
B(IN=B);
BY ID;
IF A AND B AND FIRST.ID;
RUN;
The view NEWA will use the index to set A in your desired order. Because it is
a view, you do not have the I/O of creating an intermediate dataset.
I must caution you, if your observation is small relative to your block size, it
is faster to sort than to use the index to read.
Tim Berryhill - Contract Programmer and General Wizard
TWB2@PGE.COM
Frequently at Pacific Gas & Electric Co., San Francisco
The correlation coefficient between their views and
my postings is slightly less than 0
----------------------[Reply - Original Message]----------------------
Suppose dataset A is sorted by var ID then by DATE1 within each ID;
it's also indexed by var ID then by var DATE2 within each ID. There
are multiple observations for each combination ID,DATE1 and each
combination ID,DATE2.
Dataset B contains just a short list of IDs, sorted by ID, with one
observation per ID.
Now I want to select the observations in A with the earliest value of
DATE2 for each ID that also appears in B. An obvious strategy is
something along the lines of:
data new; merge
a (in=ina)
b (in=inb);
by id;
if ina and inb and first.id;
The problem here is that the "natural" sort order of A will be used,
so that I'll end up with the earliest DATE1 instead of the earliest
DATE2. I need some way of specifying that SAS should use the index for
A instead of the sort order. I can of course use an extra sort or
datastep to make sure the cases are in the correct order, but this
seems to me (a) inelegant and (b) inefficient if A is large.