|
Folks,
Let me put in my $.02. Discretizing a continuous variable for
use as a predictor variable is a very common artifice in the
epidemiological literature. This is usually performed so that
the epidemiologist can make some statement about relative risks
for some outcome, and convey the RR in a simple manner to their
colleagues (or at least an approximation to the RR). Now, it
needs to be understood exactly what discretizing the continuous
predictor variable actually is doing: it allows the user to fit
a nonlinear curve to the data. Moreover, this nonlinear curve
is discontinuous at the break points. This is an ugly model if
I ever saw one. It says that the response is homogeneous within
the (artificially) chosen intervals, and that from the end of
one interval to the beginning of the next there is often a
significant difference in the response. Now, I ask whether it
is reasonable to believe that dietary habits (consumption of
fruits and vegetables, percent energy from fat) change dramatically
from age 34 to age 35, or from age 59 to age 60. I really suspect
not, but these are commonly employed models. I would have to
agree with Peter that risk for all kinds of poor outcomes related
to low birth weight do not change dramatically from 1499 grams
to 1500 grams. The risks are probably even greater if the infant
weighs 1100 grams than if the infant weighs 1499 grams. And a
child that weighs 1501 grams probably is at more risk for poor
outcomes than a child who weighs 2300 grams.
Now, I work with epidemiologists. I have fit many a regression
model in which age has been discretized into 3 or 4 intervals.
For simple presentation in epidemiological journals, these are
the accepted standards. I will not chastise too loudly that
this should not be done, although I have tried to suggest
alternatives to my colleagues. I have absolutely no doubt that
the models which use discretized continuous variables are biased.
There are likely very few circumstances in which a noncontinuous
response are reasonable. (I leave the door open for a few such
outcomes. However, they do not regularly present themselves.)
I have lately been working with an epidemiologist who has had
something of an epiphany regarding these issues. When he came
to me, he had collaborated with another statistician in the use
of flexible regression functions. In particular, for that
collaboration they had employed Generalized Additive Models (GAMs).
I am not a great fan of GAMs. When you are done fitting the
model, can you state the regression equation? I don't believe
that GAMs do provide a simple expression. However, there are
other tools which allow for flexible regression modelling which
yield functions with simple expressions. I had long thought
that restricted cubic splines could be a very useful tool for
modelling nonlinear (or suspected nonlinear) functions of
continuous variables. We are currently using spline methods.
Unlike GAMs, with splines you can plug in a value for some
continuous predictor and get directly an estimated response.
However, even though you may be able to return an estimate
directly, it may still be difficult to convey the shape of the
response without resorting to graphical methods. This is the
direction which I believe we ought to be headed with the
modelling of the relationship between responses and continuous
covariates: fit some sort of flexible regression and graphically
display the fitted response.
For polytomous response models, I have developed a macro which
will perform this work in (what I believe to be) a relatively
easy to use package. I don't know that it is ready for prime
time, but if there is interest in the use of the macro, I would
be willing to share it.
>Date: Thu, 25 Jan 2001 13:22:13 -0500
>Reply-To: Peter Flom <peter.flom@NDRI.ORG>
>From: Peter Flom <peter.flom@NDRI.ORG>
>Subject: Re: Proc GLM
>To: SAS-L@LISTSERV.UGA.EDU
>
>>>> "Dennis G. Fisher" <dfisher@CSULB.EDU> 01/25/01 01:08PM >>>
>wrote
>
>>>>I have to weigh in on this one. Usually I would agree that ruining a >>>perfectly good continuous variable by dichotomizing it is not a good >>>thing to do and I once gave such advice to a grad student. It turned out >>>that I was wrong. The variable was birthweight. This actually turned out >>>to be a dichotomous variable, which is something I did not know at the >>>time. Infants can be classified into low birth weight and non low >>>birthweight. Low birth weight is a proxy (or perhaps an indicator) that >>>there were problems with the pregnancy. So non-low birthweight infants >>>mean that the indicators of lbw problems were not present. It does not >>>mean that infants who are very heavy are somehow protected against >>>these problems. In the case of this grad student, the infants should
>>>>have been classified into low birth weight and non low birthweight. >>>Weight should not have been treated as a continuous variable. You >>>have to understand the meaning of the variable before giving an opinion >>>about the analysis. So I guess I agree with Dr. Kruse.
>
>Clearly, understanding the menaing of the variable before giving an opinion is vital, and I hesitate to argue with someone who knows so much more than I about statistics.
>
>However, it seems to me that even low birth weight is not a Yes/No variable.
>
>One classification I have seen is 1500 grams. But, dichotomizing at this point implies that a baby of 1499 grams is markedly different from one weighing 1501 grams. It seems to me that babies who weigh 1,000 grams would be at much more risk that those who weigh 1,500 grams, although I don't know the literature on the subject. I would suspect that, if one graphed "proportion of problem pregnancies" vs. "birth weight" the curve would asymptote at some point. So, one useful transformation of weight might be "weight below" the number at which the asymptote occurs.
>
>
>Does this make sense?
>
>
>
>.
>
>
>
>Peter L. Flom, Ph.D.
>Principal Research Associate
>National Development and Research Institutes, Inc.
>2 World Trade Center
>16th floor
>New York, NY 10048
>
>(212) 845-4485
>(212) 845-4698 (fax)
>Peter.Flom@ndri.org
Dale
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
------------------------------------------------------------
--== Sent via Deja.com ==--
http://www.deja.com/
|