Date: Fri, 4 May 2001 11:06:22 -0500
Reply-To: Sterling Price <ssprice@WAL-MART.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sterling Price <ssprice@WAL-MART.COM>
Subject: Subset 28M tape file by 500K SAS dataset?
Content-Type: text/plain
I've got a 28 million record tape file that I need to subset to only those
records that match keys contained in a SAS dataset of roughly 500,000
observations. Obviously, a sort/merge situation is to be avoided, and I
think using a format lookup would consume a huge amount of memory. So, I
came to the conclusion that an indexed lookup might be best. The index is a
composite index on two numeric variables: STORE and ITEM. I wrote some code
that seems to work OK, but I'm wondering if it could be more efficient, or
if another lookup method would be more appropriate. Here's the code I have
right now:
DATA ITEMS(INDEX=(STORITEM=(STORE ITEM)/UNIQUE));
SET STRITEMS.STRITEMS;
RUN;
DATA MATCHED.STORITMS;
INFILE TAPEIN;
INPUT @0001 ITEM 9.
@0010 STORE 4. @;
SET ITEMS KEY=STORITEM/UNIQUE;
IF _IORC_ NE 0 THEN DO;
DELETE;
_IORC_ = 0;
_ERROR_ = 0;
END;
ELSE DO;
INPUT @0016 OQ PD4.
@0024 OP PD4.
@0028 OUTL PD4.
@0032 SSTOCK PD4.1
@0058 RAWOQ PD4.
@0291 LEADTIME PD2.
@0434 REV_TIME PD2. ;
OUTPUT;
END;
RUN;
I appreciate any suggestions or comments. It isn't really intolerable now (
last time I ran the program it finished in about 45 minutes elapsed time),
but I have the nagging feeling it could be better.
Thanks,
Sterling Price
**********************************************************************
This email and any files transmitted with it are confidential
and intended solely for the individual or entity to
whom they are addressed. If you have received this email
in error destroy it immediately.
**********************************************************************