LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 28 Jun 2003 16:47:53 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: Command to match based on "close"
Content-Type: text/plain; charset="iso-8859-1"

Different orders of tokens in strings create special problems. Sometimes the difference in structure has importance (David Bruce vs Bruce David) and sometimes not (Auden, W. H. vs W. H. Auden). SPEDIS() calculates the 'cost' of rearranging one string to match another.

Take a look at TIPS 00276 and 00356 under SAS Tips at http://sconsig.com . The implementation of SPEDIS() under 00276, max((1-(length(t1.F_name)*spedis(t1.F_name,t2.F_name)/200)),0.1) as s1,

adjusts the cost for lengths of strings. Increasing the constant value (say from 200 to 1800) increases the sensitivity of matching.

Sig

-----Original Message----- From: jsl [mailto:nospam@NOSPAM.COM] Sent: Friday, June 27, 2003 3:48 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Command to match based on "close"

Thanks for the posts. SPEDIS does appear to be what I'm looking for; however, it also appears that it make not work as good as I was hoping, although maybe using it with some manual checking procedures would still be efficient. Let me explain (in case anyone as anything else to offer)... It sometimes gives relatively high scores for those that I really want to keep as matches (say, the example below) while others that should not be matched get lower (better) scores: say "SAY INDUSTRIES" vs. "SPRAY INDUSTRIES".

Thanks again, Jim "Real SAS User" <sasuser@GUILDENSTERN.DYNDNS.ORG> wrote in message news:20030626220337.GU22764@ganymede... > on Thu, Jun 26, 2003 at 05:39:21PM -0400, jsl (nospam@NOSPAM.COM) wrote: > > Is there a SAS command/fuction that will allow one to compare to variables > > based on how close the characters match up. For example, say I have > > Var in file 1: W H Hambrecht > > Var in file 2: Hambrecht W H > > > > I like to be able to specify that these records match up by saying, for > > example, that 90% or so of the characters are the same, albeit in a > > different order. > > > > Any commands to allow for this? > > SPEDIS, among others. > > -- > Charming man. I wish I had a daughter so I could forbid her to marry one...


Back to: Top of message | Previous page | Main SAS-L page