Date: Tue, 10 May 2005 10:02:49 -0700
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Multiple record data problem and survival analysis
In-Reply-To: <1115665889.995643.65430@z14g2000cwz.googlegroups.com>
Content-type: text/plain; charset=US-ASCII
Neerav Monga <neerav.monga@GMAIL.COM> wrote:
> I am stuck on this problem and was hoping someone can help. Here is
the
> problem:
>
> I have a varying number of records per patient, an age of diagnosis
for
> the disease and a diagnosis code number (i.e. ICD codes that start
with
> 153 or 154). I am interested in developing a survival analysis model
> with time to disease as my main outcome.
>
> The issues that make this confusing are: a) each subject has multiple
> icd codes, however I only want those that have a particular disease
> (e.g. cancer) b) If a person has multiple cancers (ie. icd starts with
> 153/154) I want to select the record with the youngest age of onset of
> cancer (since I am predicting time to 1st cancer) c) some subjects do
> not have cancer at all, yet have multiple records and I want to select
> those with the youngest age (just to keep my coding rules consistant).
It seems to me that you have a harder problem than that.
[1] If a person can have multiple records with multiple ICD codes,
then it seems to me that you can have patients who have at least one
record before a cancer diagnosis. In that case, it seems to me that
you WOULD NOT want the earliest record for the non-cancer patients.
Can you go back and confer with others at your site and work out better
selection rules?
[2] If you want a survival analysis model, then you have to also
consider
that the earlier record information for each patient might have a lot of
diagnostic and modeling value. So perhaps you need the information of
the
other records as well. Now you have to decide on how to model this
information
and how you want to analyze the data.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> wrote on 05/09/2005
12:11:30 PM:
> Hi Everyone,
> Data Snapshot:
>
> subject agediagnosis icdcode
> 4979 64 1841
> 4979 62 1741
> 3673 42 1820
> 3673 72 1539
> 1989 70 1531
> 1989 71 1889
> 2989 60 1531
> 2989 71 1549
>
> I hope this is clear, my goal is to have one record per observation
> fitting the various criteria i've explained. Thanks a lot for any
> suggestions in advance.
>
> Cheers,
>
> Neerav