Date: Fri, 16 May 2003 11:22:57 -0400
Reply-To: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject: Re: Merging a large and small dataset
I would focus on a couple of the words here. First, "will"; I would say
that PROC SQL *may* do an unsorted merge. One cannot directly control the
methods which SQL chooses. Second, "ideally", which is an accurate and
important qualification.
On Thu, 15 May 2003 11:43:19 -0500, George <nospam@NOSPAM.COM> wrote:
>Proc SQL will do an unsorted merge. Increase the sql buffersize if
>needed. Also you can create a view instead of a dataset. Ideally
>the whole merge will be performed in memory without the use of
>any temporary files.
>
>"Robert Pope" <eschpope99@netscape.net> wrote in message
>news:6cdc1a71.0305141036.660ac827@posting.google.com...
>> I have a 4,000,000+ record file with policyholder information. Then I
>> have two 5,000 record files: one associates plancodes with the
>> reporting product line, the other associates treaty codes with
>> reinsurance companies.
>>
>> I want to add the appropriate reporting product line and reinsurance
>> company name to each record in my policyholder file, while maintaining
>> the existing sort (non-)order.
>>
>> The obvious solution is to add an _N_ variable, followed by a
>> SORT/MERGE for product followed by a second SORT/MERGE for reinsurer,
>> followed by a SORT by _N_. But those 3 sorts add an excessive amount
>> of run time to the program (IIRC 1 hour per sort). Is there a way to
>> avoid having to SORT the main dataset, perhaps with something like
>> Excel's VLookup function?
>>
>> I would almost be tempted to hard-code the two small files into the
>> main data step, except they are continually updated.
>>
>> Thanks,
>> Rob Pope
|