> From: Charles [mailto:cwentzel@PHC4.ORG]
> So, I'm performing matches within a dataset. Let's say I'm
> doing a proc sql with two tables a and b.
I think you might consider output the non-matches to separate tables
--consider the exclusive or operator,
in SAS that's xor(A,B)
DATA BOTH XOR;
if and(HaveA,HaveB) then output BOTH; %* ;-) ;
else if xor(HaveA,HaveB) then output XOR;
who knows what it is in SQL? --
and examine them, to see if you can't get a grip on what the most common
remember SS-3, and SS-2, the first five digits of SS,
are regional codes.
Given a group of SSn from a state
you'll find most of your variation in SS-4
which is why it is sometimes used as a password/authentication string.
Given those factoids
I'd look for transpositions in columns 1&2, 2&3,
6&7, 7&8, 8&9,
Ron Fehd the macro maven CDC Atlanta GA USA RJF2@cdc.gov
OpSys: Win_Pro Ver: 8.2
Repetition reduction enhances elegance!
Repetition reduction furthers finesse!
points for poofreading the and(HaveA,HaveB) phrase
> If the ssn's are equal, I can just say (in the where or join
> If I want to allow the ssns to differ by one digit, I can say:
> But what if I want to look for ssns that have two characters
> transposed? I
> started with:
> (a.ssn =
> (b.ssn,3)) or
> (b.ssn,4)) or
> (b.ssn,5)) or
> (b.ssn,6)) or
> (b.ssn,7)) or
> (b.ssn,8)) or
> (b.ssn,9)) or
> (a.ssn =substr(b.ssn,1,7)||substr(b.ssn,9,1)||substr(b.ssn,8,1)
> but this obviously only addresses transposed neighboring
> digits. I could
> continue with all (9c2)=36 possible combinations... but I
> feel like I'm
> missing something. Is there a simpler way?
> Plus, I'm quite curious to see how y'all respond to the idea of soft
> matches in general... Any good references out there?