Date: Fri, 29 Aug 2003 17:25:52 +0200
Reply-To: Hans Reitsma <j.reitsma@AMC.UVA.NL>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Hans Reitsma <j.reitsma@AMC.UVA.NL>
Organization: Academic Medical Center, The Netherlands
Subject: optimalisation procedure in record linkage project?
Dear all
After a probabilistic record linkage operation of two files
(A and B) we have a file with potential links, e.g. pairs of
records from file A and B whose total weights are above a
certain cut-off value. Some of these potential links are
intertangled (is that the right word?) meaning that 1 record
from file A can be linked to two different records of file
B. The weights can be different, but they are both above the
cut-off value. We know that only one record from file A (or
B) can be linked to only one record in the other file. This
example is easy to solve, the pair with the highest weight
wins and becomes the link, the other becomes a non-link.
However, more complex situations occur (see also example
data). In these situations., I want to obtain the solution
that maximises the total sum of weights of all the pairs
that belong to that solution. This means that not
automatically the link with the highest weight wins. Here
are some sample data to clarify the issue.
Cluster id_a id_b weight Desired result
total weight of solution (links)
1 1 3 11 non-link
1 1 6 13 link
13
2 3 8 11 non-link
2 4 9 11 link
2 5 8 14 link
2 5 9 13 non-link
25
Any help, suggestions?
Hans Reitsma, MD PhD
Dept. of Clinical Epidemiology & Biostatistics
PO Box 22700, 1100 DE, Amsterdam, The Netherlands
Phone: +31-20-5663273, Fax: +31-20-6912683
E-mail: j.reitsma@amc.uva.nl
|