|Date: ||Wed, 11 May 2005 14:24:13 -0700|
|Reply-To: ||Neerav Monga <neerav.monga@GMAIL.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Neerav Monga <neerav.monga@GMAIL.COM>|
|Subject: ||Re: Multiple record data problem and survival analysis|
|Content-Type: ||text/plain; charset="iso-8859-1"|
Frank: This data is from a cancer registry, so I'm not so concered
about the issues you've mentioned. I would be if this was actual
hospital data and then everything you said is of concern.
David: You're right about the non-cancer patients, I hadn't thought of
the information that could be lost. However, I think there is some
confusion on the cancer patients. I am trying to predict time to first
cancer diagnosis (ie. I don't care what other dx's they have, even if
ithey have multiple cancers, I want to only the first cancer).
In a nutshell, I have duplicate records due to multiple ICD code
diagnoses and I want only one record per subject. If they have cancer
(or more than one cancer), I want the record with the first cancer
diagnosis age, if they do not, I still want one record, maybe I can use
the latest age available so I don't lose any data as David suggests.
Thanks for the continued discussion.