Date: Sat, 28 Jun 2003 16:47:53 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Command to match based on "close"
Content-Type: text/plain; charset="iso-8859-1"
Different orders of tokens in strings create special problems. Sometimes the
difference in structure has importance (David Bruce vs Bruce David) and
sometimes not (Auden, W. H. vs W. H. Auden). SPEDIS() calculates the 'cost'
of rearranging one string to match another.
Take a look at TIPS 00276 and 00356 under SAS Tips at http://sconsig.com .
The implementation of SPEDIS() under 00276,
max((1-(length(t1.F_name)*spedis(t1.F_name,t2.F_name)/200)),0.1) as s1,
adjusts the cost for lengths of strings. Increasing the constant value (say
from 200 to 1800) increases the sensitivity of matching.
Sig
-----Original Message-----
From: jsl [mailto:nospam@NOSPAM.COM]
Sent: Friday, June 27, 2003 3:48 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Command to match based on "close"
Thanks for the posts. SPEDIS does appear to be what I'm looking for;
however, it also appears that it make not work as good as I was hoping,
although maybe using it with some manual checking procedures would still be
efficient. Let me explain (in case anyone as anything else to offer)... It
sometimes gives relatively high scores for those that I really want to keep
as matches (say, the example below) while others that should not be matched
get lower (better) scores: say "SAY INDUSTRIES" vs. "SPRAY INDUSTRIES".
Thanks again,
Jim
"Real SAS User" <sasuser@GUILDENSTERN.DYNDNS.ORG> wrote in message
news:20030626220337.GU22764@ganymede...
> on Thu, Jun 26, 2003 at 05:39:21PM -0400, jsl (nospam@NOSPAM.COM) wrote:
> > Is there a SAS command/fuction that will allow one to compare to
variables
> > based on how close the characters match up. For example, say I have
> > Var in file 1: W H Hambrecht
> > Var in file 2: Hambrecht W H
> >
> > I like to be able to specify that these records match up by saying, for
> > example, that 90% or so of the characters are the same, albeit in a
> > different order.
> >
> > Any commands to allow for this?
>
> SPEDIS, among others.
>
> --
> Charming man. I wish I had a daughter so I could forbid her to marry
one...
|