|
Hi,
I would like to have the Text Miner application available, but I don't.
I havea dataset with two character string variables. Each variable can
have from one to many (30-40) individual, distinct words in it. I would
like to check if any of the words in VARIABLE1 can be found in VARIABLE2.
It would be nice to see if there are more than one of the VARIABLE1 words
found in VARIABLE2.
An example set:
Obs VARIABLE1
1 CATARACT EXTRACTION WITH IOL-RIGHT
2 CATARACT EXTRACTION WITH IOL-LEFT
3 SPINE THORACO LUMBAR POSTERIOR FUSION SILO
4 SPINE THORACO LUMBAR POSTERIOR FUSION SILO
5 KNEE ARTHROPLASTY TOTAL UNILATERAL
6 PHACOEMULSIFICATION W IOL
7 LEG-LIGATION & STRIPPING VARICOSE VEINS -BILATERAL
8 LEG-LIGATION & STRIPPING VARICOSE VEINS -BILATERAL
9 EYE-EXTRACTION CATARACT IOL
10 EYE-EXTRACTION CATARACT IOL
Obs VARIABLE2
1 Excision total, lens extracapsular phakoemulsification technique w
2 Excision total, lens extracapsular phakoemulsification technique w
3 Installation of external appliance, circulatory system NEC extraco
4 Fusion, spinal vertebrae open posterior approach [posterolateral a
5 Implantation of internal device, knee joint with combined sources
6 Excision total, lens extracapsular phakoemulsification technique w
7 Excision partial, veins of leg NEC without use of tissue open appr
8 Destruction, skin of leg using device NEC [electrocautery]
9 Excision total, lens extracapsular phakoemulsification technique w
10 Excision total, lens extracapsular phakoemulsification technique w
Observations 4, 5, 6, 7, and 8 have some common words between VARIABLE1
and VARIABLE2 although there are differences in the case type, in the fact
that in VAR2 some words are composite, and also some words differ sligthly:
6 PHACOEMULSIFICATION phakoemulsification
I imagine that the first variable needs to be split in the separate words
and each word needs to be checked against every of the VARIABLE2 words,
maybe with soundex?
Any suggestions are more than welcomed.
Sincerely,
Cornel Lencar
|