LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2001, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 3 Aug 2001 08:24:04 -0400
Reply-To:     "Diskin, Dennis" <Dennis.Diskin@PHARMA.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Diskin, Dennis" <Dennis.Diskin@PHARMA.COM>
Subject:      Re: Match-Merge a Small to Large Dataset
Comments: To: "Kitzmann, Daniel J." <kitzmann.daniel@MAYO.EDU>
Content-Type: text/plain

Dan,

Paul Dorfman has already given you a lot of pertinent advice. My comment is: do you really need more than a simple match-merge ? How often are you running this and how often do the large and small data sets change ? Is the large dataset already sorted on key ? Or if not, is this a prohibitive step ?

Remember that in addition to processing overhead, setup and maintenance are important factors.

fwiw, Dennis Diskin

> -----Original Message----- > From: Kitzmann, Daniel J. [SMTP:kitzmann.daniel@MAYO.EDU] > Sent: Thursday, August 02, 2001 5:52 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Match-Merge a Small to Large Dataset > > Dear SAS-Lers: > > Just soliciting advice on the preferred approach to the following problem: > > I want to match-merge a SMALLd dataset (~10,000 obs) containing a unique > KEYv variable with a series of LARGEd datasets (each ~2M obs), which > contain > multiple records with repeats on the same KEYvs. The desired MERGEd > dataset > will contain all records, including repeats on KEYv, from LARGEd that > match > on KEYv in SMALLd. > > I've been reading the SAS-L archive literature on Key-Searching and > Hashing, > chiefly those posts of Paul Dorfman, as well as the relevant published > SUGI > papers. I am pleased to report that I think I am, for an autodidact > programmer anyway, gradually albeit laboredly acquiring a verstehen for > that > is going on and why. Before proceeding too far, however, I just want to > confirm that I am looking down the apt alley here. > > Some additional facts: I'm working in the OS/390 batch environment. KEY > is > indeed an integer, and even though the number of KEYv obs in SMALLd is > merely around 10,000, KEYv's theoretical range spans seven digits. Am I > correct in therefore supposing that the Coalescing List Hashing is the > method to employ? Thank you kindly in advance. > > Cordially, > Dan > kitzmann.daniel@mayo.edu


Back to: Top of message | Previous page | Main SAS-L page