Date: Fri, 3 Mar 2000 19:15:54 +0200
Reply-To: Matti Haapanen
<matti.haapanen_remove_this_part_when_replying_@METLA.FI>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Matti Haapanen
<matti.haapanen_remove_this_part_when_replying_@METLA.FI>
Organization: A poorly-installed Metla test site
Subject: Incorrect LSMEANS from PROC GLM!?
Hi all,
Can anyone explain the strange behaviour of PROC GLM (as well as
MIXED) in certain situations regarding the estimation of least-
square means. I have noticed that LSMEANs can sometimes differ
depending on the set of factors included in the model. Odd estimates
may occur when the right-hand side of the MODEL statement consists of
two independent factors which share the same observations in some of
their levels.
For the sake of simplicity I compiled a small imaginary data set to
demonstrate the problem on this forum. The input data set consists
of two factors REPL (replication) and TRT (treatment) in addition to
which I generate a third factor TRT_TYPE (type of treatment) in the
data step.
The problem is that the least-square means for REPL change
dramatically depending on whether TRT_TYPE is included in the model
or not.
I repeated both analyses on SPSS. Surprise! Unlike SAS, SPSS was
NOT sensitive to the presence of TRT_TYPE in the model - The LSMEANs
from the 'full' and 'reduced' models were identical (see below).
Therefore, it seems that SAS and SPSS sometimes apply different coefficients
matrices
when they calculate the marginal means for REPL. - Why?
The marginal LSMEANs obtained in the second model from PROC GLM do not seem
to be very sensible. I'd be grateful if someone could give a reference to
this problem or explain how one can EASILY get the 'correct' LSM estimates
from SAS. (Yes I know it's possible to write an ESTIMATE statement
but I consider that pretty laborous and prone to errors)
regards,
Matti Haapanen
Helsinki, Finland
data sasdata;
input repl trt var1;
if trt=1 then trt_type=1; else trt_type=2;
* The first level of TRT and TRT_TYPE is common ;
datalines;
1 1 5
2 1 6
3 1 5
4 1 6
5 1 10
1 2 34
2 2 32
3 2 30
4 2 25
5 2 29
1 3 40
2 3 59
3 3 40
4 3 43
5 3 41
1 4 28
2 4 33
3 4 34
4 4 42
5 4 48
;;;;
run;
title 'REDUCED MODEL EXCLUDING TRT_TYPE';
proc glm data=sasdata;
class repl trt_type trt;
model var1 = repl trt / ss3;
lsmeans repl / stderr;
run;
title 'FULL MODEL INCLUDING TRT_TYPE';
proc glm data=sasdata;
class repl trt_type trt;
model var1 = repl trt_type trt(trt_type);
lsmeans repl / stderr;
run;
The least-square means for REPL from the reduced model are
REPL VAR1 Std Err Pr > |T|
LSMEAN LSMEAN H0:LSMEAN=0
1 26.7500000 3.1221654 0.0001
2 32.5000000 3.1221654 0.0001
3 27.2500000 3.1221654 0.0001
4 29.0000000 3.1221654 0.0001
5 32.0000000 3.1221654 0.0001
...whereas from the full model they are:
REPL VAR1 Std Err Pr > |T|
LSMEAN LSMEAN H0:LSMEAN=0
1 19.0500000 3.2245585 0.0001
2 24.8000000 3.2245585 0.0001
3 19.5500000 3.2245585 0.0001
4 21.3000000 3.2245585 0.0001
5 24.3000000 3.2245585 0.0001
The SPSS output (the same model as in the second case, but
the LSMEANS different from the SAS ones):
Estimated Marginal Means
REPL
Dependent Variable: VAR1
| ---- | --------- | ----- | ---------------------------------- |
| | Mean | Std. | 95% Confidence Interval |
| ---- | | Error | -------------------- | ----------- |
| REPL | | | Lower Bound | Upper Bound |
| ---- | --------- | ----- | -------------------- | ----------- |
| 1.00 | 26.750(a) | 3.122 | 19.947 | 33.553 |
| ---- | --------- | ----- | -------------------- | ----------- |
| 2.00 | 32.500(a) | 3.122 | 25.697 | 39.303 |
| ---- | --------- | ----- | -------------------- | ----------- |
| 3.00 | 27.250(a) | 3.122 | 20.447 | 34.053 |
| ---- | --------- | ----- | -------------------- | ----------- |
| 4.00 | 29.000(a) | 3.122 | 22.197 | 35.803 |
| ---- | --------- | ----- | -------------------- | ----------- |
| 5.00 | 32.000(a) | 3.122 | 25.197 | 38.803 |
| ---- | --------- | ----- | -------------------- | ----------- |
¢
a Based on modified population marginal mean.
----