| Date: | Thu, 17 May 2001 10:03:33 -0400 |
| Reply-To: | Sigurd Hermansen <hermans1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sigurd Hermansen <hermans1@WESTAT.COM> |
| Subject: | Re: PROC MATCH |
|
For what it's worth, you may certainly have my permission to post the little
program that I wrote to illustrate fuzzy linkage/matching using a highly
simplified scoring method. Some may find my example of how to use the SAS
SPEDIS() function (cribbed almost directly from SI documentation) useful as
they begin to experiment with fuzzy key linkage. I still feel compelled to
warn anyone who does experiment with it that it forms a Cartesian product of
a table and its own image. If applied to tables of more than a few thousand
rows, it will likely blow up.
A couple of years ago or so I posted a SAS macroprogram that generated
weights based on frequencies for elements elements of linkage keys and used
the weights to calculate similarity scores per record pair. That program
includes a section in which the user can specify blocking variables that SAS
SQL can use to form an index. Anyone considering a fuzzy or probabilistic
record linkage/matching project needs to understand blocking strategies and
how to use them. Sig
|