Date: Wed, 9 Jan 2008 11:18:14 -0600
Reply-To: Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US>
Subject: Re: AIC mystery in MIXED
Content-Type: text/plain; charset="us-ascii"
Ryan,
I haven't run your data, but here's my take. There's no mystery. AIC
is a statistic that combines number of parameters (as a penalty) with
your log-likelihood. When you change the data (by transforming) you
change the likelihood.
Now, it seems unlikely that the normal error assumption and constant
error assumption would be met in both the untransformed and the
transformed data, so I would suggest either going back to the original
data to see if the transform is useful, or looking at the model
residuals to see what they might be telling you about the fits. A
physical basis for the transform might also exist.
Bottom line for AIC, for it to work you must be comparing apples to
apples, as it works for nested and unnested models, but not for models
with different data.
Warren Schlechte
-----Original Message-----
From: Ryan Utz [mailto:rutz@AL.UMCES.EDU]
Sent: Wednesday, January 09, 2008 9:55 AM
Subject: AIC mystery in MIXED
Hi all,
I'm having issues using/interpreting AIC scores in proc MIXED. I'm
trying
to compare simple linear relationships with power function relationships
(both models have been shown to be consistently valid in related
datasets).
When I go to interpret AIC (or AICc, etc) scores, however, power
relationships always emerge as the better model, even when it clearly
isn't
the case. As an example, I provided my actual data for an extremely
simple
model at the bottom of this email (I'm testing much more complex models,
but
the example below illustrates the problem). To test the power
relationship,
I've log-transformed both X and Y. Running the code below shows that
MIXED
suggests the power relationship is better (it has a lower AIC score),
but if
you run a simple linear regression, clearly the non-transformed data
(thus a
linear relationship) is superior. This is true even when both models
have
the exact same number of parameters.
Is there something I'm doing wrong here, either in execution or
interpretation? I'd like to use AIC scores to help choose a model, but
because of this issue I'm vary hesitant.
Thanks ahead of time for any advice,
Ryan Utz
University of Maryland Center for Environmental Science
data test;
input density length; cards;
0.099266504 82.8125
0.048193642 85.05405405
0.114893617 84.34210526
0.257685811 70.515625
0.044660194 86.92857143
0.244736842 76.37647059
0.020619946 89.5
0.058555133 93.6
0.125817923 84.08888889
data test2; set test;
lndensity = log(density);
lnlength= log (length); run;
title Linear Relationship;
proc mixed data=test2;
model length=density; run;
title Power Relationship;
proc mixed data=test2;
model lnlength=lndensity; run;
/*Simple regression for comparison*/
Title Linear relationship-simple regression;
proc glm data=test2;
model length=density; run;
Title Linear relationship-Power function;
proc glm data=test2;
model lnlength=lndensity; run;