LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2002, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 11 Apr 2002 17:41:14 -0400
Reply-To:     "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Subject:      Re: Matching strings of unequal length
Comments: To: "Carriere, Ron" <rcarriere@MEDNET.UCLA.EDU>
Content-Type: text/plain; charset=iso-8859-1

Ron,

Here's a simple idea: Store the short file names, with a period appended to them on the right, in a hash table. Then read the larger file and search the table for matches using the colon modifier. Here is a simple sample code (below, I chose 1003 because it is prime and much greater than 350). Matches will be marked by 1, no-matches -- by a missing value.

data small ; input fn $char44. ; cards; LAPK.RDSUM.OSHPD.WWH LAPK.DCMMRDRV.CNTLCARD.NPH LAPK.DCMMRDRV.CNTLCARD.SMH LAPK.DCMMRDRV.CNTLCARD.WWH LAPK.RDMSTR.RDTXLRC.SORTIP run ;

data large ; input fn $char44. ; cards ; LAPK.RDSUM.OSHPD.WWH.G0002V00 LAPK.DCMMRDRV.CNTLCARD.NPH.G0008V00 LAPK.DCMMRDRV.CNTLCARD.SMH.SM917E.DATA LAPK.DCMMRDRV.CNTLCARD.SMX.SM917E.DATA LAPK.DCMMRDRV.CNTLCARD.WWH.T1018B.G0001V00 LAPK.RDMSTR.RDTXLRC.SORTIP.G0661V00 LAPK.RDMSTR.RDTXLRC.SORTIZ.G0661V00 run ;

%let h = 1003 ;

data match (keep = fn match) ; array h (0:&h) $44. _temporary_ ; if _n_ = 1 then do until (s) ; set small end = s ; k = trim(fn) || '.' ; do j = mod(input(k,pib6.),&h) by 1 until (h(j) = k) ; if j > &h then j = 0 ; if h(j) =: '' then h(j) = k ; end ; end ; set large ; do j = mod(input(fn,pib6.),&h) by 1 until ( h(j) =: '') ; if j > &h then j = 0 ; if h(j) =: substr(fn,1,length(h(j))) then do ; match = 1 ; leave ; end ; end ; run ;

proc print data= match ; run ;

I believe in V8.2 (which I do not have handy at the moment), one could code use the equivalent of the EQ: operator, EQT, to compare the names in a join like

small.FN EQT substr(large.FN, 1, length(small.FN))

Try it. If it works, it is somewhat simpler than coding a hash.

Kind regards, ===================== Paul M. Dorfman Jacksonville, FL =====================

> -----Original Message----- > From: Carriere, Ron [mailto:rcarriere@MEDNET.UCLA.EDU] > Sent: Thursday, April 11, 2002 3:26 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Matching strings of unequal length > > > I have two files. The first is a table that looks like: > > LAPK.RDSUM.OSHPD.WWH DAILY > LAPK.DCMMRDRV.CNTLCARD.NPH MONTHLY > LAPK.DCMMRDRV.CNTLCARD.SMH MONTHLY > LAPK.DCMMRDRV.CNTLCARD.WWH MONTHLY > LAPK.RDMSTR.RDTXLRC.SORTIP WEEKLY > > The second file shows file names > > LAPK.RDSUM.OSHPD.WWH.G0002V00 > LAPK.DCMMRDRV.CNTLCARD.NPH.G0008V00 > LAPK.DCMMRDRV.CNTLCARD.SMH.SM917E.DATA > LAPK.DCMMRDRV.CNTLCARD.WWH.T1018B.G0001V00 > LAPK.RDMSTR.RDTXLRC.SORTIP.G0661V00 > > I would like to match up the file names in the second file > with the table > ignoring the extraneous data in file names, i.e. the > generation identifiers > and low level qualifiers (G0002v00/SM917E.DATA). The second file has > approximately 10,000 entries the first 350. So in the > example above the > first file name matches up with the first entry and so on > with the last > file name matching up with the last table entry. If I could > be certain > that the only extraneous information in the second file were > the generation > numbers, then I could search for and strip them off and > simply merge the > two files. But this will not work for the third example in > the second file > and many other examples as well. Suggestions??? > > Ron Carriere > UCLA Medical Center > >

Blue Cross Blue Shield of Florida, Inc., and its subsidiary and affiliate companies are not responsible for errors or omissions in this e-mail message. Any personal comments made in this e-mail do not reflect the views of Blue Cross Blue Shield of Florida, Inc.


Back to: Top of message | Previous page | Main SAS-L page