Date: Mon, 27 Jul 2009 22:40:30 -0400
Reply-To: Frank Mwaniki <mwaniki@EPSILON-STATISTICS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Frank Mwaniki <mwaniki@EPSILON-STATISTICS.COM>
Subject: Re: Random Intercept in GLIMMIX?
In-Reply-To: <200907272243.n6RAlvqa018337@malibu.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
SAS-Lers,
Maybe I am wrong but does this not make your work externally invalid? From
what I know, the intercept represents the condition of the subject before
your analysis and/or experiment. One has to choices a) ignore the intercept
due to some reason or b) let the model determine the intercept. Some
statistical procedures allow you to ignore the intercept because it is
inherently accounted for.
If you force the model to randomly chose an intercept, aren't you biasing
it?
I would go for repeated sampling (boot-strapping) rather than brute forcing
it. Sometimes we tend to over-complicate issues that are very simple if we
adhere to sound statistical theory. I think Dale (below) tries to explain
it.
Of course I may be terribly wrong, in which case, I beg your pardon.
Regards,
Frank
> From: Ryan <Ryan.Andrew.Black@GMAIL.COM>
> Subject: Random Intercept in GLIMMIX?
> To: SAS-L@LISTSERV.UGA.EDU
> Date: Friday, July 24, 2009, 8:01 PM
> So, I've been struggling with this
> issue for quite some time, and
> thought I'd pose the question to the group.
>
> At what point does including a random intercept distort the
> results if you have very few multiple observations for most
> people?
>
> Here's a simple example:
>
> Let's say I want to conduct an analysis on people who report
> having at least one disease out of 4 diseases. Suppose that
> 75% of these people only have one disease, while the remaining
> have mostly two (22%), and very few having three or more (3%).
> Let's say the dependent variable (DV) is binary with 0
> indicating positive response (i.e the disease interrupts my
> life) and 1 indicating a negative response (i.e. the disease
> does NOT interrupt my life).
>
> Here's what the dataset looks like:
>
> /*****************************/
> Person Disease DV
> 1 1 0
> 2 1 1
> 2 3 1
> 3 1 0
> 4 2 0
> 5 2 0
> 5 3 1
> 6 4 0
> .
> .
> /*****************************/
>
> So, I figured I'd run the analysis with and without the person
> random effects in the GLIMMIX procedure:
>
> (1)
>
> proc glimmix data=mydata;
> class disease;
> model dv = disease / s dist=binary link=logit ;
> random intercept / subject=person;
> run;
>
> (2)
>
> proc glimmix data=mydata;
> class disease;
> model dv = disease / s dist=binary link=logit ;
> run;
>
> ---
>
> What I found was that the model that includes the random
> intercept provided estimates that are not consistent with
> the raw data, while the model without the random effects
> are much closer. I do not like the idea of running a model
> without the random effects statement when 25% of my data
> are multiple observations, particularly since I believe
> there is a decent amount of covariation between observations
> within the same person.
>
> How do you propose I handle this dilemma?
>
> Thanks!
>
> Ryan
>
Ryan,
You state that including the random person effect (random
intercept) distorts the results and results in estimates
which are not consistent with the data. How so? I really
don't know what to make of the question without clarification
of what you mean when you state that the random effects model
results in "estimates that are not consistent with the raw
data".
Unless the random effects model fails to converge or converges
to a very poor solution, the random effects model should
produce results which are at least as good as the fixed effect
model.
I did notice in later communication between you and Oliver
that the question of fitting the model employing NLMIXED did
come up. The question was asked whether NLMIXED would produce
the same estimates as the GLIMMIX procedure if adaptive
Gaussian quadrature was specified as the solution method for
GLIMMIX. The response to this question should be a qualified
"Yes, they will usually produce the same estimates, assuming
that you start the estimation process from the same initial
parameter estimates or assuming that there is only one
parameter set which yields a local optimum value of the
likelihood." If there are multiple local optima (which is
quite possible when you estimate a logit model for a binary
response subject to random effects), then the initial values
which are employed to start the likelihood maximization
process can lead to different solutions.
The GLIMMIX procedure allows you to specify initial parameter
estimates for parameters that are part of the covariance
structure. However, the GLIMMIX procedure does not allow
you to specify initial parameter estimates for parameters
which are part of the fixed effect model except through
the INITGLM and INITITER options on the GLIMMIX invocation
statement. The NLMIXED procedure allows you to specify initial
parameter estimates not only for the parameters which are part
of the covariance structure, but also allows you to specify
initial parameter estimates for parameters which are employed
as fixed effects.
In addition to wanting to know what you mean when you state
that the model with random intercept estimats distorts the
results, I also have to wonder what likelihood values are
reported for the fixed-effect only model vs the model which
includes random effects. Could the random effect model be
producing results which are superior but which deviate
markedly from the fixed effect model - so much so that the
random effect model seems to be incorrect.
Dale