LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2002, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 6 Mar 2002 12:02:57 -0800
Reply-To:   Karina Haavik <karina@SEAS.UPENN.EDU>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Karina Haavik <karina@SEAS.UPENN.EDU>
Organization:   http://groups.google.com/
Subject:   Re: Unique Patient ID for merging Healthcare / Rx Data
Content-Type:   text/plain; charset=ISO-8859-1

We deal with a lot of dirty claim data. A fuzzy merge would probably give a "correct" solution, but it is more than we are willing to take on at this time.

hermans1@westat.com (Sigurd Wilson Hermansen) wrote in message news:<5d9bf112.0203051502.3b4d3f42@posting.google.com>... > > This discussion has mixed together discussions of two issues. First, > name plus birthdate do not guarantee a distinct identifier for each > person in any set of records. Neither does name plus birthdate plus > the person's sex.

We prefer to match by subscriber SSN or contract number plus a serial family member number when possible. [Several of out products use subscriber SSN as part of the member number, so it is one field which is actually entered correctly by hospitals.] In data in which a family member number is not available or not reliable we use subscriber SSN or contract number (zero-filled to $9) plus YYYYMM of member DOB plus member sex. This distinguishes pretty much everyone except same-sex twins.

Sigurd also said: > > Second, names have variations and those > transcribing names and birthdates tend to make lots of errors. Records > for the same person may not link because the "face values" of the > names in the two records do not match exactly.

We then identify the same-sex twins and add the first 7 characters of their first name to distinguish between them. This minimizes the impact of dirty name data, since we only use name on same-sex twins. Since billing is not processed by name, the dirtiness of name fields is awful.

Our end result is a long character ID with 2 possible fixed lengths. Not ideal, but it seems to work. We still lose matches on bad data entry for SSN and DOB, but not as many as if using name on all records.

Karina


Back to: Top of message | Previous page | Main SAS-L page