```Date: Mon, 27 Jul 2009 22:40:30 -0400 Reply-To: Frank Mwaniki Sender: "SAS(r) Discussion" From: Frank Mwaniki Subject: Re: Random Intercept in GLIMMIX? In-Reply-To: <200907272243.n6RAlvqa018337@malibu.cc.uga.edu> Content-Type: text/plain; charset="us-ascii" SAS-Lers, Maybe I am wrong but does this not make your work externally invalid? From what I know, the intercept represents the condition of the subject before your analysis and/or experiment. One has to choices a) ignore the intercept due to some reason or b) let the model determine the intercept. Some statistical procedures allow you to ignore the intercept because it is inherently accounted for. If you force the model to randomly chose an intercept, aren't you biasing it? I would go for repeated sampling (boot-strapping) rather than brute forcing it. Sometimes we tend to over-complicate issues that are very simple if we adhere to sound statistical theory. I think Dale (below) tries to explain it. Of course I may be terribly wrong, in which case, I beg your pardon. Regards, Frank > From: Ryan > Subject: Random Intercept in GLIMMIX? > To: SAS-L@LISTSERV.UGA.EDU > Date: Friday, July 24, 2009, 8:01 PM > So, I've been struggling with this > issue for quite some time, and > thought I'd pose the question to the group. > > At what point does including a random intercept distort the > results if you have very few multiple observations for most > people? > > Here's a simple example: > > Let's say I want to conduct an analysis on people who report > having at least one disease out of 4 diseases. Suppose that > 75% of these people only have one disease, while the remaining > have mostly two (22%), and very few having three or more (3%). > Let's say the dependent variable (DV) is binary with 0 > indicating positive response (i.e the disease interrupts my > life) and 1 indicating a negative response (i.e. the disease > does NOT interrupt my life). > > Here's what the dataset looks like: > > /*****************************/ > Person Disease DV > 1 1 0 > 2 1 1 > 2 3 1 > 3 1 0 > 4 2 0 > 5 2 0 > 5 3 1 > 6 4 0 > . > . > /*****************************/ > > So, I figured I'd run the analysis with and without the person > random effects in the GLIMMIX procedure: > > (1) > > proc glimmix data=mydata; > class disease; > model dv = disease / s dist=binary link=logit ; > random intercept / subject=person; > run; > > (2) > > proc glimmix data=mydata; > class disease; > model dv = disease / s dist=binary link=logit ; > run; > > --- > > What I found was that the model that includes the random > intercept provided estimates that are not consistent with > the raw data, while the model without the random effects > are much closer. I do not like the idea of running a model > without the random effects statement when 25% of my data > are multiple observations, particularly since I believe > there is a decent amount of covariation between observations > within the same person. > > How do you propose I handle this dilemma? > > Thanks! > > Ryan > Ryan, You state that including the random person effect (random intercept) distorts the results and results in estimates which are not consistent with the data. How so? I really don't know what to make of the question without clarification of what you mean when you state that the random effects model results in "estimates that are not consistent with the raw data". Unless the random effects model fails to converge or converges to a very poor solution, the random effects model should produce results which are at least as good as the fixed effect model. I did notice in later communication between you and Oliver that the question of fitting the model employing NLMIXED did come up. The question was asked whether NLMIXED would produce the same estimates as the GLIMMIX procedure if adaptive Gaussian quadrature was specified as the solution method for GLIMMIX. The response to this question should be a qualified "Yes, they will usually produce the same estimates, assuming that you start the estimation process from the same initial parameter estimates or assuming that there is only one parameter set which yields a local optimum value of the likelihood." If there are multiple local optima (which is quite possible when you estimate a logit model for a binary response subject to random effects), then the initial values which are employed to start the likelihood maximization process can lead to different solutions. The GLIMMIX procedure allows you to specify initial parameter estimates for parameters that are part of the covariance structure. However, the GLIMMIX procedure does not allow you to specify initial parameter estimates for parameters which are part of the fixed effect model except through the INITGLM and INITITER options on the GLIMMIX invocation statement. The NLMIXED procedure allows you to specify initial parameter estimates not only for the parameters which are part of the covariance structure, but also allows you to specify initial parameter estimates for parameters which are employed as fixed effects. In addition to wanting to know what you mean when you state that the model with random intercept estimats distorts the results, I also have to wonder what likelihood values are reported for the fixed-effect only model vs the model which includes random effects. Could the random effect model be producing results which are superior but which deviate markedly from the fixed effect model - so much so that the random effect model seems to be incorrect. Dale ```

Back to: Top of message | Previous page | Main SAS-L page