```Date: Thu, 1 May 2008 15:52:44 -0400 Reply-To: Susan Durham Sender: "SAS(r) Discussion" From: Susan Durham Subject: Re: glimmix, overdispersion, & nested random factors Comments: To: Andrea Previtali On Fri, 25 Apr 2008 15:14:42 -0400, Andrea Previtali wrote: >I read about offset variables and noticed that they can be consider as a >measure of "exposure". Most of the time they reflect differences in the >amount of time spent in collecting the count data. However, I also found an >example where the number of vehicles that travel through an intersection is >an offset variable for explaining the number of traffic accidents (Gelman & >Hill. 2007. Data analysis using regression & multilevel/hierarchical models. >Pages 110-112). This made me think that the number of adult females could be >an offset variable. But when I tried it the models did not converge. >So, instead I tried one of the suggestions bu Sudip. I put TOTFEM in the >CLASS statment and used the following for the random option: > >RANDOM INTERCEPT / SUBJECT= TOTFEM(PLOT); > >The models converged and the overdispesion decreased to less than 1.3 in all >of them! > >It is also interesting to note that many of the factors that were highly >significant with the inappropriate model are now not significant and those >that remained significant now have a more reasonable p-value. > >My only concern is that the number of subjects is 63 and the Max Obs per >subject is only 3. Do I need to worry about this? In the same book cited >above (also see comment on Gelman's blog: >http://www.stat.columbia.edu/~cook/movabletype/archives/2006/04/how_large_a_sam.html >), he says that this is acceptable. > >Thanks again for all your feedback! >Andrea In some model scenarios, the statement RANDOM INTERCEPT / SUBJECT= TOTFEM(PLOT); is equivalent to blocking on TOTFEM(PLOT). This approach seems inappropriate for your study; PLOT "blocks" on repeated measures through time, but TOTFEM is just another observed variable--not a design variable. Instead, I'd spend more time on the OFFSET approach suggested by Warren. The OFFSET variable should be log(TOTFEM); perhaps you tried that, but if not, maybe log(TOTFEM) would have better convergence luck. We haven't addressed yet in this thread that you are fitting a multiple regression model with DN and FN as continuous explanatory variables, with a linear relationship to the response on the link scale (here, by default, log). I presume DN and FN vary through time on each PLOT, and that PLOT is the replicating factor. Given the regression nature of the model, have you considered using a random coefficient model to deal with the repeated measures? A starting place for this model is Chapter 8 in Littell et al. (2006) SAS for Mixed Models, 2nd ed, SAS Press. Are you comfortable with the linearity (or a curvilinear) assumption? If not, you might consider fitting a low-rank smoother to the repeated measures data. Schabenberger includes an example here: http://www2.sas.com/proceedings/sugi30/196-30.pdf for a 2-D spatial problem which might make a pretty good starting template for your two continuous explanatory variables. With either approach, I'd still try to fit in log(TOTFEM) as a offset. I just returned from the Applied Statistics in Agriculture conference at KSU. The current "word" is that you should not use the chi-square/df metric as a measure of overdispersion *if* you have a RANDOM statement in the model (this word from Schabenberger via a colleague)--it just doesn't work. So along with the dilemmas of model selection, we also have dilemmas of determining overdispersion in mixed models. I'm pretty sure I've not made your problem any simpler! Cheers, Susan Susan Durham Ecology Center Utah State University ```

Back to: Top of message | Previous page | Main SAS-L page