Date: Wed, 11 May 2005 11:32:37 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Dependent sample difference in mean test
Content-type: text/plain; charset=US-ASCII
> I have two dependent samples with different numbers of observations.
> need to know whether the means of the two samples are statistically
> different from each other.
> My sample_1 has approximately 800,000 observations. Sample_2 has
> approximately 130,000 observations.
> I have run a regression on sample_1 to generate coefficients. I then
> "fit" the coefficients from sample_1 to the characteristics of
> observations. This gives me a predicted value for sample_2 based on
> sample_1 coefficients. I then calculate a residual by subtracting
> sample_2 observation actual value from the predicted value (predicted
> from the sample_1 coefficients applied to the sample_2
> Then I take the mean of the residuals from sample_2.
> I repeat the process in the opposite, i.e., I run a regression on
> sample_2, get coefficients, then fit the coeffificients from sample_2
> to the sample_1 characteristics. This generates a predicted value,
> which I subtract from each sample_1 actual - this generates the
> sample_1 residuals. I then take the mean sample_1 residual.
> I expect the sample_1 and sample_2 residuals to be of opposite sign.
> need to test the difference in the mean residuals. I have two
> dependent samples (of residuals) and I have very different sample
> (of residuals).
> I can make the assumption that they are perfectly negatively
> and proceed with a t-test. Then assume that they are perfectly
> uncorrelated and proceed with a t-test. This will give me a range of
> t-stats for my test.
> But, I was hoping someone could help me with a stronger (or more
> direct) test. I'm afraid the range won't give strong enough results.
> So, this is a statistical theory question instead of a direct SAS
Hey, stat questions are allowed here too.
But first... Why are you doing this? This doesn't make much sense
to me, and your resulting data are NOT directly comparable.
You cannot do either t-test. Period. You want to assume that you have
something in between perfectly correlated and uncorrelated, so your
t-statistic would be bracketed. It doesn't work that way.
Even worse, both of the t-statistics you have in mind assume that
the observations are independent. In a paired t-test, one assumes
that the *differences* are independent. In a two-sample test, one
assumes that all n1+n2 observations are independent of one another.
You have created residuals which are (by construction) all
You have no independent observations here, and you shouldn't be
considering a basic t-test.
So, step back. Write to SAS-L (not to me personally) and explain
why you are doing this, and what you hope to achieve. The big picture
would be helpful. Perhaps someone here can point you toward a more
BTW, with sample sizes like you have, your statistical tests will
be really flaky, since the size of n will drive virutally anything
to appear significant. Why do you have such large samples, and where
do they come from, and what do they represent?
David Cassell, CSC
Senior computing specialist