Date: Wed, 11 May 2005 08:52:28 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
Subject: Dependent sample difference in mean test
Content-Type: text/plain; charset="iso-8859-1"
I have two dependent samples with different numbers of observations. I
need to know whether the means of the two samples are statistically
different from each other.
My sample_1 has approximately 800,000 observations. Sample_2 has
approximately 130,000 observations.
I have run a regression on sample_1 to generate coefficients. I then
"fit" the coefficients from sample_1 to the characteristics of sample_2
observations. This gives me a predicted value for sample_2 based on
sample_1 coefficients. I then calculate a residual by subtracting each
sample_2 observation actual value from the predicted value (predicted
from the sample_1 coefficients applied to the sample_2
Then I take the mean of the residuals from sample_2.
I repeat the process in the opposite, i.e., I run a regression on
sample_2, get coefficients, then fit the coeffificients from sample_2
to the sample_1 characteristics. This generates a predicted value,
which I subtract from each sample_1 actual - this generates the
sample_1 residuals. I then take the mean sample_1 residual.
I expect the sample_1 and sample_2 residuals to be of opposite sign. I
need to test the difference in the mean residuals. I have two
dependent samples (of residuals) and I have very different sample sizes
I can make the assumption that they are perfectly negatively correlated
and proceed with a t-test. Then assume that they are perfectly
uncorrelated and proceed with a t-test. This will give me a range of
t-stats for my test.
But, I was hoping someone could help me with a stronger (or more
direct) test. I'm afraid the range won't give strong enough results.
So, this is a statistical theory question instead of a direct SAS