Date: Fri, 3 Aug 2001 08:24:04 -0400
Reply-To: "Diskin, Dennis" <Dennis.Diskin@PHARMA.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Diskin, Dennis" <Dennis.Diskin@PHARMA.COM>
Subject: Re: Match-Merge a Small to Large Dataset
Content-Type: text/plain
Dan,
Paul Dorfman has already given you a lot of pertinent advice.
My comment is: do you really need more than a simple match-merge ?
How often are you running this and how often do the large and small data
sets change ?
Is the large dataset already sorted on key ? Or if not, is this a
prohibitive step ?
Remember that in addition to processing overhead, setup and maintenance are
important factors.
fwiw,
Dennis Diskin
> -----Original Message-----
> From: Kitzmann, Daniel J. [SMTP:kitzmann.daniel@MAYO.EDU]
> Sent: Thursday, August 02, 2001 5:52 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Match-Merge a Small to Large Dataset
>
> Dear SAS-Lers:
>
> Just soliciting advice on the preferred approach to the following problem:
>
> I want to match-merge a SMALLd dataset (~10,000 obs) containing a unique
> KEYv variable with a series of LARGEd datasets (each ~2M obs), which
> contain
> multiple records with repeats on the same KEYvs. The desired MERGEd
> dataset
> will contain all records, including repeats on KEYv, from LARGEd that
> match
> on KEYv in SMALLd.
>
> I've been reading the SAS-L archive literature on Key-Searching and
> Hashing,
> chiefly those posts of Paul Dorfman, as well as the relevant published
> SUGI
> papers. I am pleased to report that I think I am, for an autodidact
> programmer anyway, gradually albeit laboredly acquiring a verstehen for
> that
> is going on and why. Before proceeding too far, however, I just want to
> confirm that I am looking down the apt alley here.
>
> Some additional facts: I'm working in the OS/390 batch environment. KEY
> is
> indeed an integer, and even though the number of KEYv obs in SMALLd is
> merely around 10,000, KEYv's theoretical range spans seven digits. Am I
> correct in therefore supposing that the Coalescing List Hashing is the
> method to employ? Thank you kindly in advance.
>
> Cordially,
> Dan
> kitzmann.daniel@mayo.edu
|