Date: Thu, 26 Jun 2003 06:18:07 +0000
Reply-To: sashole@bellsouth.net
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject: Re: Cut a part out of a big flat fixed length text file
Content-Type: text/plain; format=flowed
PD,
All you need is a decent algorithm to store the 150K keys in a
memory-resident table and then search the table for each of the 42M records
read from the tape, discarding those whose keys are not found in the table.
The format you have mentioned stores the keys in an AVL tree, so looking
them up should be quick, and since a 150K table is rather miniscule to
handle even for Proc FORMAT (well-known for its memorial voracity), it
should not spawn out-of-memory horror stories, either.
Suppose your large file is attached to a filename LARGE, and the 150K SAS
data file called ACCTLIST contains the single 16-byte variable ACCTNO. Also
suggest that in the large file, the account number occupies the leftmost 16
bytes. Then you might proceed as follows:
data cntlin / view = cntlin ;
retain fmtname 'search' type 'c' label '+' ;
do until ( eof ) ;
set acctlist ( rename = (acctno=start) ) ;
output ;
end ;
hlo = 'O' ; *this is letter O, not zero ;
label = '-' ;
output ;
stop ;
run ;
proc format cntlin = cntlin ;
run ;
data _null_ ;
infile large ;
input @ 001 acctno $char16. @ ;
if put (acctno, search.) = '+' then put _infile_ ;
run ;
That is likely to provide all the performance you need, because the process
is I/O bound, and an increase in the speed of the table look-up will be
offset by the overhead of reading the tape. However, if your key is actually
a digit string (i.e. is composed solely of [decimal] digits), and the desire
for better CPU performance is overwhelmingly powerful, another, one-step,
solution could be
%let h = 300001 ;
data _null_ ;
array hk ( 0 : &h ) _temporary_ ;
if _n_ = 1 then do until ( eof ) ;
set acctlist end = eof ;
do h = mod (acctno, &h) by 1 until ( hk(h) = acctno ) ;
if h = &h then h = 0 ;
if hk (h) = . then hk(h) =acctno ;
end ;
end ;
infile large ;
input @1 acctno 16. @ ;
do h = mod (acctno, &h) by 1 until ( hk(h) = acctno | hk(h) = . ) ;
if h = &h then h = 0 ;
end ;
if hk(h) = acctno then put _infile_ ;
run ;
Please note that the code is strictly off the top of a sick head, but there
is a possibility it will work. HTH.
Kind regards,
-------------------------------
Paul M. Dorfman
Jacksonville, FL
-------------------------------
>From: PD <sophe88@YAHOO.COM>
>Reply-To: PD <sophe88@YAHOO.COM>
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Cut a part out of a big flat fixed length text file
>Date: Wed, 25 Jun 2003 21:00:23 -0700
>
>I have a big fixed length text file with about 2000 vars. The primary
>key is a 16-character var. There are about 42 millions obs in the
>file. It sits on a OS390 tape image. (can not browse it, but
>reading/sharing is fine). I have another SAS dataset that has about
>150k of this 16-charter var.
>
>I need to preseve the layout of the text file, but only want to
>records for the 150k people. I have the layout for the text file, but
>do not want to write them into a SAS data set, afraid the layout may
>not be preseved.
>
>I used to run a JCL based Syncsort program that allows me to
>match-merge two text files using a common key without transferring
>them to SAS files. Recently the program is not functioning properly.
>So I am resorting to SAS again.
>
>Heard about CNTln option in proc format. Don't know how relevant that
>is to my problem.
>
>Thank you.
>PD
_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*
http://join.msn.com/?page=features/virus