|Date: ||Wed, 6 Mar 2002 12:02:57 -0800|
|Reply-To: ||Karina Haavik <karina@SEAS.UPENN.EDU>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Karina Haavik <karina@SEAS.UPENN.EDU>|
|Subject: ||Re: Unique Patient ID for merging Healthcare / Rx Data|
|Content-Type: ||text/plain; charset=ISO-8859-1|
We deal with a lot of dirty claim data. A fuzzy merge would probably
give a "correct" solution, but it is more than we are willing to take
on at this time.
firstname.lastname@example.org (Sigurd Wilson Hermansen) wrote in message
> This discussion has mixed together discussions of two issues. First,
> name plus birthdate do not guarantee a distinct identifier for each
> person in any set of records. Neither does name plus birthdate plus
> the person's sex.
We prefer to match by subscriber SSN or contract number plus a serial
family member number when possible. [Several of out products use
subscriber SSN as part of the member number, so it is one field which
is actually entered correctly by hospitals.] In data in which a
family member number is not available or not reliable we use
subscriber SSN or contract number (zero-filled to $9) plus YYYYMM of
member DOB plus member sex. This distinguishes pretty much everyone
except same-sex twins.
Sigurd also said:
> Second, names have variations and those
> transcribing names and birthdates tend to make lots of errors. Records
> for the same person may not link because the "face values" of the
> names in the two records do not match exactly.
We then identify the same-sex twins and add the first 7 characters of
their first name to distinguish between them. This minimizes the
impact of dirty name data, since we only use name on same-sex twins.
Since billing is not processed by name, the dirtiness of name fields
Our end result is a long character ID with 2 possible fixed lengths.
Not ideal, but it seems to work. We still lose matches on bad data
entry for SSN and DOB, but not as many as if using name on all