Date: Wed, 25 May 2005 15:15:16 -0700
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: student t test after multiple imputation
In-Reply-To: <1117052583.858288.13020@o13g2000cwo.googlegroups.com>
Content-type: text/plain; charset=US-ASCII
sunwenyu@GMAIL.COM wrote:
> I have a question about how to combine student t test results after
> multiple imputations. Can I get just one P value by combining multiple
> t test results? I couldn't find an example from SAS documentation,
> though SAS did provide samples on how to combine results from
> regression analysis or mixed model analysis by using the MIANALYZE
> procedure.
First of all, what are you trying to test? I'm assuming you
have a data set with some holes in it. Are you trying to
take a variable Y and test H0: mu=0? Are you trying to test
H0: mu=mu0 whre mu0 is a non-zero constant? Are you trying
to create a confidence interval for mu?
Second, are your Y values normally distributed? And independent?
And identically distributed? Oh, good. Because, even though the
basic t test is relatively robust to departures from normality,
things can go haywire just by tossing in a few outliers, or adding
a contaminating distribution, or including serial correlation, or...
If your underlying assumptions are not met, then you should look
at a different test.
For a simple t-test, PROC MI and PROC MIANALYZE are simple. In
particular,
when oyu have more than one variable you want to test, you can do
everything
in PROC MI *without* ever jumping through something like PROC UNIVARIATE
and then passing the results back into PROC MIANALYZE. Take a look at
this example (SAS 9.1.2):
/* X is normally distributed. Y has a couple outliers. */
data temp1(drop=seed);
seed = 38596303;
do n = 1 to 36;
x = 42 + 10*rannor(seed);
y = x + ( mod(n,10)=0 )*40*ranuni(seed);
output;
if n in (8,16,24,32) then do; y=.; output; end;
if n in (4,12,20,28,36) then do; x=.; output; end;
end;
run;
proc print data=temp1; run; /* take a look at the data if you want */
/* PROC MI will do the t-tests for you, and even let you choose
the mu0 that you want for each of your variables. */
proc mi data=temp1 out=mi1 nimpute=5 seed=58703654 mu0=40 40 ;
var x y;
run;
If you run this code, you'll see that PROC MI will do the imputation for
you. The default is what you want for normally distributed data when
you're not trying to impute Y as a linear regression on X (or several
IVs). You'll get a single-chain MCMC. (Note for people following the
AR(1) thread going yesterday and today: the MCMC uses a burn-in period
of 200 iterations.) Then you get information on the variance within
and between imputations. This tells you how much of the noise you see
is due to the activity of filling in your gaps. Finally, you get the
tests you wanted. Note that the mean of Y is only a little higher than
the mean of X, and the variance of Y is only a bit larger than the
variance of X. But the p-values are quite distinct.
If you *want* to run things through PROC MIANALYZE, you still can.
Let PROC MI build your imputations. Then run the output data set
through
PROC MEANS, PROC SUMMARY, or PROC UNIVARIATE to get the means and
standard errors BY _IMPUTATION_ . Take that data and feed that
straight into PROC MIANALYZE using the DATA= option and EDF= (give it
the number of records - 1 for the complete-data degrees of freedom).
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician