Date: Wed, 12 Nov 2008 18:37:41 -0800
Reply-To: Ryan <Ryan.Andrew.Black@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ryan <Ryan.Andrew.Black@GMAIL.COM>
Subject: Re: Insufficient Resources to Perform Adaptive Qaudrature...
Content-Type: text/plain; charset=ISO-8859-1
On Nov 12, 4:31 pm, stringplaye...@yahoo.com (Dale McLerran) wrote:
> I see that Robin has already responded about the basic nature
> of the problem here. Let me just add the following observations
> about Ryan's specific problem.
> Ryan has two levels of random effects: city (N~=40) and person
> within city (N~=2500 or N~=62/city). The random effects
> design matrix requires a column for each city random effect
> (intercept, month, and disease*x1 as specified in the model).
> In addition, the design matrix includes more than 60 columns
> of person within city random effects. I don't know how the
> GLIMMIX procedure is constructed - whether the design matrix
> for the person random effects is reduced as much as possible
> so that the number of columns in the person effects portion
> of the design is represented as the number of persons in
> the city that has the most individuals (something over 62)
> or whether the design is represented with 2500 columns and
> thus 2500 random effects. I would hope that it is the former.
> Regardless, the quadrature solution requires integrating over
> all of the random effects which means at a minimum there would
> be 3 + (62+) nested integrals. The computation problem
> becomes immense here. Following Robin's post, there would
> be at least 3**(65) likelihood evaluations per iteration.
> While the GLIMMIX procedure does in theory support estimation
> of nested random effect models when a quadrature solution has
> been requested, in practice this is limited to models with
> a suitably small design matrix. The data employed here do
> not have a suitably small design matrix for the random effects.
> Thus, I see no hope of fitting this model using quadrature
> Finally, let me post another thought or two about the model
> which is fit here. There appear to be four disease classes,
> but disease was not named on the CLASS statement. That seems
> wrong to me. I would also echo Robin's concerns regarding
> month parameterization (class vs continuous). I also have
> questions regarding whether month and disease*x1 should be
> named on the random statement.
> I might take exception to Robin's suggestion to ignore the
> person random effects and focus only on the city random
> effects. If anything, I would remove the city random effects
> and specify only the person random effects. There is
> probably more variability in the person random effects than
> in the city random effects. You want to account for the main
> source of random variation. Thus, it would seem to me that
> you would want to model the person random effects. When the
> city random effects have been excluded, the person random
> effects will include city variability. But the reverse is
> not true. When persons are excluded, then their random
> effects are averaged over and will almost certainly average
> to something quite close to zero within a city - especially
> when there are so many persons/city.
> Thus, I would suggest dropping the city random effects and
> estimate the model with just person random effects. The
> nesting problem that causes expansion of the random effect
> design matrix disappears when city is dropped from the
> model. You should be able to fit the person random effect
> model with more quadrature points (although if month and
> disease*x1 are kept as random effects and especially if
> month and disease have been added to the CLASS statement,
> then the model could once again expand beyond what can
> reasonably be handled with a large number of quadrature
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra@NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> --- On Wed, 11/12/08, Robin R High <rh...@UNMC.EDU> wrote:
> > From: Robin R High <rh...@UNMC.EDU>
> > Subject: Re: Insufficient Resources to Perform Adaptive Qaudrature...
> > To: SA...@LISTSERV.UGA.EDU
> > Date: Wednesday, November 12, 2008, 8:06 AM
> > Ryan,
> > Terms or images to express the "infinite" have
> > long been a part of our
> > history:
> > From Genesis 15, God speaks to Abram "Look toward
> > heaven and count the
> > stars, if you are able to count them." Then he said to
> > him, "So shall your
> > descendants be."
> > ...in the early 80s it was "Cosmos" and Carl
> > Sagan's famous phrase
> > "billions and billions..."
> > .. and to match wits, when "infinity" looks at
> > us, we react with the
> > desire for "more POWER, EH, EH!" (Tim the
> > Toolman)
> > .. and now the new SAS 9.2 generation can speak of
> > "infinity" through
> > "random effects with quadrature", though I
> > imagine that Abram or the
> > "Cosmos" audience would produce a blank stare if
> > such terminology had been
> > in vogue.
> > So, after that digression, please read the section on
> > quadrature/laplace
> > options in
> > from which the following section is extracted:
> > ***begin quote
> > proc glimmix method=quad(qpoints=5);
> > class A B id;
> > model y = / dist=negbin;
> > random A / subject=id;
> > run;
> > Computing the marginal log likelihood requires the
> > numerical evaluation of
> > a four-dimensional integral. As
> > part of this evaluation, 54 = 625 conditional log
> > likelihoods need to be
> > computed per observation on each
> > pass through the data. Now suppose that an additional
> > random effect with b
> > = 2 levels is added and the
> > previous RANDOM statement becomes the following:
> > random A A*B / subject=id;
> > For each observation, this now requires evaluation of
> > 5**(4+8) =
> > 244,140,625 [commas added for clarity] conditional log
> > likelihoods on each
> > pass through the data.
> > ***end quote
> > .. whatever "each pass" actually means, remember
> > that GLIMMIX needs to
> > iterate and reiterate to find a solution, which I suppose
> > like the
> > national debt or "bailout" figures we've been
> > hearing gets rather large
> > fairly quickly. I suppose this is one reason that I once
> > recall hearing
> > Oliver S. himself say that he was reluctant to include quad
> > as an option
> > with GLIMMIX (though it is nice to now be able to produce
> > the same results
> > with GLIMMIX that one would produce with NLMIXED).
> > So, in my opinion, don't hestitate to work with other
> > estimation routines,
> > esp. when you have a reasonably large sample size (which
> > you do) and the
> > random effects aren't too "big".
> > And finally, just a few thoughts about your code, one can
> > always start
> > with a simpler model and work up in complexity, if
> > possible. And I'm
> > curious why is month treated as a linear covariate, esp.
> > one treated as
> > having a random slope within each city? Perhaps quarterly
> > for a seasonal
> > categorical effect might be a suggestion. And the following
> > code would
> > likely capture the essence of the problem, though with
> > binary data it may
> > be computationally difficult to estimate random subject
> > effects, even
> > though conceptually it would be important.
> > proc glimmix data=mydata ;
> > class Person_ID City_ID; *include month or a quarterly
> > effect here?? ;
> > model y = Disease x1 month disease*x1 / s dist=binary;
> > random intercept x1 / subject=City_ID ; * would a
> > type=<another option>
> > work better? ;
> > run;
> > .. as always, something new to learn or features to
> > consider,esp. now with
> > 9.2 available...
> > Robin High
> > UNMC
> > Note: not sure why the original message became garbled when
> > I look at it
> > on the SAS-L archives, so thought I would re-submit
> > hopefully with this
> > version of the note now in an ungarbled format.
> > Ryan <Ryan.- Hide quoted text -
> - Show quoted text -
You response (along with Robin and Kevin) has been so helpful. I
REALLY hope you have the time to reply to this follow-up post. I
promise to be as brief as humanly possible.
As I'm sure you've noticed, this model is quite similar to another
model I had posted a while back. This model is essentially an
extension of that model, except that it's accounting for time. So,
here's my primary research question:
Are the odds of contracting disease A significantly different than the
odds of contracting disease B in a specified time period (e.g. three
months), after controlling for X1? (X1 is at the city level). And, do
the odds change over time. Yes, I indicated previously that there are
four diseases, but this number can be reduced if necessary. I am MOST
interested in Disease A versus Disease B, so although it would be
interesting to include two other similar diseases, it is not
imperative. (I had forgotten to include disease in the model statement
here--I didn't forget when I actually ran the analysis.)
This is a strange design (for me) because time is not a repeated
measures variable--different people are measured at different time
points. I included time for two reasons: (1) kind of as an indicator
variable for x1 and (2) to monitor the odds ratios across time.
There are two approaches that I've seen in the literature to deal with
this type of research question (GLMM using MLE - adaptive quadrature
or GLMM using Bayesian estimation). It's becoming apparent that the
first option may be just too computationally intensive. Although,
there might be a way to combat this unruly design matrix?! Since I'm
not really interested in an odds ratio based on an entire year and a
half of data, why not cut the data up into quarters and run 4 GLIMMIXs
per year? Unfortunately this approach will not allow me to test if
there is a signifcant difference across time, but I'd at least be able
to see the changes (with the 95% confidence bands). Do you agree so
So, let's assume for the moment that since I'm only looking at 3
months at a time there will be substantially fewer subjects and
zipcodes, and ultimately a manageable design matrix for Method=Quad.
With the information you have thus far, what would be the ideal model?
Would it look like this?
proc glimmix data=mydata Method=Quad;
class Disease Person_ID City_ID;
model y = Disease x1 month disease*x1 / s dist=binary;
random intercept x1 month disease*x1 / subject=City_ID;
random intercept / subject = Person_ID(City_ID);
*I had never thought disease*x1(City_ID) needed to be considered a
random effect. It's also suprising that you'd suggest to have the
interaction without the main effect of disease in the random
statement. I'll keep thinking about this for a while.
If this ideal model had trouble converging, what would you consider as
a bare bones model that would still be considered acceptable? I'd
really like to keep the random intercept of city as I'm in the process
of creating BLUPS statements to detect city outliers *within* each
I've asked a lot. I completely understand if you're just too busy to
respond. You and Robin have taught me much about mixed and glimmix.
I'm very grateful.
This is my last hope at running this model using MLE. If cutting the
data won't do it, then I might have to abandon the MLE ship.