| Date: | Wed, 11 May 2005 14:24:13 -0700 |
| Reply-To: | Neerav Monga <neerav.monga@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Neerav Monga <neerav.monga@GMAIL.COM> |
| Organization: | http://groups.google.com |
| Subject: | Re: Multiple record data problem and survival analysis |
|
| In-Reply-To: | <4AD865E7.7A6C5745.F743F212@netscape.net> |
| Content-Type: | text/plain; charset="iso-8859-1" |
|---|
Frank: This data is from a cancer registry, so I'm not so concered
about the issues you've mentioned. I would be if this was actual
hospital data and then everything you said is of concern.
David: You're right about the non-cancer patients, I hadn't thought of
the information that could be lost. However, I think there is some
confusion on the cancer patients. I am trying to predict time to first
cancer diagnosis (ie. I don't care what other dx's they have, even if
ithey have multiple cancers, I want to only the first cancer).
In a nutshell, I have duplicate records due to multiple ICD code
diagnoses and I want only one record per subject. If they have cancer
(or more than one cancer), I want the record with the first cancer
diagnosis age, if they do not, I still want one record, maybe I can use
the latest age available so I don't lose any data as David suggests.
Thanks for the continued discussion.
Neerav
|