LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 26 Jun 2003 06:18:07 +0000
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject:      Re: Cut a part out of a big flat fixed length text file
Comments: To: sophe88@YAHOO.COM
Content-Type: text/plain; format=flowed

PD,

All you need is a decent algorithm to store the 150K keys in a memory-resident table and then search the table for each of the 42M records read from the tape, discarding those whose keys are not found in the table. The format you have mentioned stores the keys in an AVL tree, so looking them up should be quick, and since a 150K table is rather miniscule to handle even for Proc FORMAT (well-known for its memorial voracity), it should not spawn out-of-memory horror stories, either.

Suppose your large file is attached to a filename LARGE, and the 150K SAS data file called ACCTLIST contains the single 16-byte variable ACCTNO. Also suggest that in the large file, the account number occupies the leftmost 16 bytes. Then you might proceed as follows:

data cntlin / view = cntlin ; retain fmtname 'search' type 'c' label '+' ; do until ( eof ) ; set acctlist ( rename = (acctno=start) ) ; output ; end ; hlo = 'O' ; *this is letter O, not zero ; label = '-' ; output ; stop ; run ;

proc format cntlin = cntlin ; run ;

data _null_ ; infile large ; input @ 001 acctno $char16. @ ; if put (acctno, search.) = '+' then put _infile_ ; run ;

That is likely to provide all the performance you need, because the process is I/O bound, and an increase in the speed of the table look-up will be offset by the overhead of reading the tape. However, if your key is actually a digit string (i.e. is composed solely of [decimal] digits), and the desire for better CPU performance is overwhelmingly powerful, another, one-step, solution could be

%let h = 300001 ;

data _null_ ; array hk ( 0 : &h ) _temporary_ ; if _n_ = 1 then do until ( eof ) ; set acctlist end = eof ; do h = mod (acctno, &h) by 1 until ( hk(h) = acctno ) ; if h = &h then h = 0 ; if hk (h) = . then hk(h) =acctno ; end ; end ; infile large ; input @1 acctno 16. @ ; do h = mod (acctno, &h) by 1 until ( hk(h) = acctno | hk(h) = . ) ; if h = &h then h = 0 ; end ; if hk(h) = acctno then put _infile_ ; run ;

Please note that the code is strictly off the top of a sick head, but there is a possibility it will work. HTH.

Kind regards, ------------------------------- Paul M. Dorfman Jacksonville, FL -------------------------------

>From: PD <sophe88@YAHOO.COM> >Reply-To: PD <sophe88@YAHOO.COM> >To: SAS-L@LISTSERV.UGA.EDU >Subject: Cut a part out of a big flat fixed length text file >Date: Wed, 25 Jun 2003 21:00:23 -0700 > >I have a big fixed length text file with about 2000 vars. The primary >key is a 16-character var. There are about 42 millions obs in the >file. It sits on a OS390 tape image. (can not browse it, but >reading/sharing is fine). I have another SAS dataset that has about >150k of this 16-charter var. > >I need to preseve the layout of the text file, but only want to >records for the 150k people. I have the layout for the text file, but >do not want to write them into a SAS data set, afraid the layout may >not be preseved. > >I used to run a JCL based Syncsort program that allows me to >match-merge two text files using a common key without transferring >them to SAS files. Recently the program is not functioning properly. >So I am resorting to SAS again. > >Heard about CNTln option in proc format. Don't know how relevant that >is to my problem. > >Thank you. >PD

_________________________________________________________________ MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus


Back to: Top of message | Previous page | Main SAS-L page