Date: Thu, 9 Dec 2004 16:50:02 -0800
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: Polytomous response logistic regression
Content-Type: text/plain; charset=us-ascii
--- Diana <diamfel@YAHOO.COM> wrote:
> Hi All,
> This is my first time here and Iím a novice, so please bear with me
> as my
> questions might seem very elementary to some.
Welcome, Diana. You have made your way to what many consider a
treasure trove of information. Hopefully, we can get you oriented
in the right direction.
> I have a dataset where subjects are categorized as=1
> injury), 2 (less medically-severe injury), or 7 (controls). This is
> response variable.
> I have several explanatory variables (all are categorical).
> I would like to run logistic models that compare A) 1 and 7; B) 2 and
> and C) 1 and 2.
> Which logistic regression procedure should I use to analyze this
> Can I use proc logistic in this situation?
You do not indicate what version of SAS you are running. It makes
a difference. If you have version 8.2 or higher, then the LOGISTIC
procedure will perform any of the standard analyses for a
polytomous response model. I would think that you would want to
fit a generalized logits model. That requires specification of
the option LINK=GLOGIT on your model statement. Thus, you would
have code something like
proc logistic data=mydata;
class X1 X2 ... Xk;
model response = X1 X2 ... Xk / link=glogit;
> Is it necessary to recode my response variable in SAS, or will using
> 1, 2,
> 7 coding scheme be adequate?
SAS can handle the coding scheme that you already have in place
with no problem.
> After I run simple regression analyses (e.g., y=x) and determine
> explanatory variables are significant on an individual level, is it
> necessary to put them in my multiple regression models in any
> order (e.g., based on their significance levels in simple
> regression)? I
> donít want to use the SELECTION option.
No, it should not matter in what order you specify the predictor
variables as long as you do not have collinearity among the
predictors. If there is collinearity among the predictors, then
you could run into some issues with the order that variables are
> Also, some of my explanatory variables have missing data levels. I
> want these missing values to be considered when running my logistic
> regression analyses. But I don't want to delete the entire record if
> there is a missing value, because other variables of interest for
> record might be populated. If it is possible and appropriate, is
> some way that the 'missing' categories can be ignored when running
> logistic models? What is the best way to handle this?
Now you are getting into an area that will require considerable
statistical expertise. You need to be very careful here. In
order to include all of the observations, you must impute values
for the missing data values. SAS has a procedure which allows you
to perform this imputation process. However, one cannot just
blindly impute data values. You need to understand something
of the reason why you have missing data. In particular, you
must be able to assume that the missing data elements are
missing at random. That is, if an element is missing, the reason
for it being missing may depend on values of observed variables,
but cannot depend on the value of the missing variable. We must
be able to assume that observations with missing values come from
the same population as observations with nonmissing values. Read
carefully the documentation for the SAS procedure MI (multiple
If you believe that your data satisfy the assumptions necessary
for missing value imputation, then you must select an appropriate
imputation method. I am not convinced that the methods available
in the SAS MI procedure are really appropriate for categorical
variables. In the past, I have employed nonparametric methods
for imputing categorical variable values. I provided a brief
description of a nonparametric imputation method last week.
You should be able to find that in the archives. It is not much,
but at least will give you some idea what I am referring to.
Having satisfied yourself that missing value imputation is valid
and having implemented an imputation approach that is appropriate
for your data, they you can fit your statistical model employing
the procedure LOGISTIC (or other procedure as appropriate) for
each imputation set that you form. You really need to form
multiple imputation data sets, analyze the data generated for
each imputation set using your standard procedure, and then
construct your final analysis employing the procedure MIANALYZE
which takes into account the uncertainty of the imputation
process when constructing standard errors and confidence intervals
around any point estimate.
> Also, is there some way to send certain portions of my results (e.g.,
> values, odds ratios, and 95% confidence intervals) to a data set (so
> donít have to copy and paste from output)?
Yes, SAS allows you to write every statistic generated for the
LST file to a dataset. In fact, you will really want to do this
in order to pass your parameter estimates from each imputation
data set to the procedure MIANALYZE. The documentation of procs
MI and MIANALYZE should show you the mechanics of writing your
statistics out to data sets. Specifically, you will want to pay
attention to ODS statements:
ODS TRACE ON;
ODS OUTPUT table=dataset;
You will find instances of these statements in the MI and/or
MIANALYZE documentation. These statements are fully documented
in the BASE SAS documentation.
You have a lot of work ahead of you if you wish to employ all
of your observed data in the data analysis. It won't be easy,
and you really should work with a statistician who can advise
you on whether your missingness can be assumed to be MAR, and
given that it is MAR advise you on appropriate imputation
Fred Hutchinson Cancer Research Center
Ph: (206) 667-2926
Fax: (206) 667-5977
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.