Date: Thu, 1 May 2008 15:52:44 -0400
Reply-To: Susan Durham <sdurham@BIOLOGY.USU.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Susan Durham <sdurham@BIOLOGY.USU.EDU>
Subject: Re: glimmix, overdispersion, & nested random factors
On Fri, 25 Apr 2008 15:14:42 -0400, Andrea Previtali
>I read about offset variables and noticed that they can be consider as a
>measure of "exposure". Most of the time they reflect differences in the
>amount of time spent in collecting the count data. However, I also found an
>example where the number of vehicles that travel through an intersection is
>an offset variable for explaining the number of traffic accidents (Gelman &
>Hill. 2007. Data analysis using regression & multilevel/hierarchical models.
>Pages 110-112). This made me think that the number of adult females could be
>an offset variable. But when I tried it the models did not converge.
>So, instead I tried one of the suggestions bu Sudip. I put TOTFEM in the
>CLASS statment and used the following for the random option:
>RANDOM INTERCEPT / SUBJECT= TOTFEM(PLOT);
>The models converged and the overdispesion decreased to less than 1.3 in all
>It is also interesting to note that many of the factors that were highly
>significant with the inappropriate model are now not significant and those
>that remained significant now have a more reasonable p-value.
>My only concern is that the number of subjects is 63 and the Max Obs per
>subject is only 3. Do I need to worry about this? In the same book cited
>above (also see comment on Gelman's blog:
>), he says that this is acceptable.
>Thanks again for all your feedback!
In some model scenarios, the statement
RANDOM INTERCEPT / SUBJECT= TOTFEM(PLOT);
is equivalent to blocking on TOTFEM(PLOT). This approach seems
inappropriate for your study; PLOT "blocks" on repeated measures through
time, but TOTFEM is just another observed variable--not a design variable.
Instead, I'd spend more time on the OFFSET approach suggested by Warren.
The OFFSET variable should be log(TOTFEM); perhaps you tried that, but if
not, maybe log(TOTFEM) would have better convergence luck.
We haven't addressed yet in this thread that you are fitting a multiple
regression model with DN and FN as continuous explanatory variables, with a
linear relationship to the response on the link scale (here, by default,
log). I presume DN and FN vary through time on each PLOT, and that PLOT is
the replicating factor.
Given the regression nature of the model, have you considered using a random
coefficient model to deal with the repeated measures? A starting place for
this model is Chapter 8 in Littell et al. (2006) SAS for Mixed Models, 2nd
ed, SAS Press.
Are you comfortable with the linearity (or a curvilinear) assumption? If
not, you might consider fitting a low-rank smoother to the repeated measures
data. Schabenberger includes an example here:
for a 2-D spatial problem which might make a pretty good starting template
for your two continuous explanatory variables.
With either approach, I'd still try to fit in log(TOTFEM) as a offset.
I just returned from the Applied Statistics in Agriculture conference at
KSU. The current "word" is that you should not use the chi-square/df metric
as a measure of overdispersion *if* you have a RANDOM statement in the model
(this word from Schabenberger via a colleague)--it just doesn't work. So
along with the dilemmas of model selection, we also have dilemmas of
determining overdispersion in mixed models.
I'm pretty sure I've not made your problem any simpler!
Utah State University