Date: Wed, 11 May 2005 11:32:37 0700
ReplyTo: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Dependent sample difference in mean test
InReplyTo: <1115826748.025038.172760@g44g2000cwa.googlegroups.com>
Contenttype: text/plain; charset=USASCII
gblockhart@YAHOO.COM wrote:
> I have two dependent samples with different numbers of observations.
I
> need to know whether the means of the two samples are statistically
> different from each other.
>
> My sample_1 has approximately 800,000 observations. Sample_2 has
> approximately 130,000 observations.
>
> I have run a regression on sample_1 to generate coefficients. I then
> "fit" the coefficients from sample_1 to the characteristics of
sample_2
> observations. This gives me a predicted value for sample_2 based on
> sample_1 coefficients. I then calculate a residual by subtracting
each
> sample_2 observation actual value from the predicted value (predicted
> from the sample_1 coefficients applied to the sample_2
> characteristics).
>
> Then I take the mean of the residuals from sample_2.
>
> I repeat the process in the opposite, i.e., I run a regression on
> sample_2, get coefficients, then fit the coeffificients from sample_2
> to the sample_1 characteristics. This generates a predicted value,
> which I subtract from each sample_1 actual  this generates the
> sample_1 residuals. I then take the mean sample_1 residual.
>
> I expect the sample_1 and sample_2 residuals to be of opposite sign.
I
> need to test the difference in the mean residuals. I have two
> dependent samples (of residuals) and I have very different sample
sizes
> (of residuals).
>
> I can make the assumption that they are perfectly negatively
correlated
> and proceed with a ttest. Then assume that they are perfectly
> uncorrelated and proceed with a ttest. This will give me a range of
> tstats for my test.
>
> But, I was hoping someone could help me with a stronger (or more
> direct) test. I'm afraid the range won't give strong enough results.
>
> So, this is a statistical theory question instead of a direct SAS
> question.
Hey, stat questions are allowed here too.
But first... Why are you doing this? This doesn't make much sense
to me, and your resulting data are NOT directly comparable.
You cannot do either ttest. Period. You want to assume that you have
something in between perfectly correlated and uncorrelated, so your
tstatistic would be bracketed. It doesn't work that way.
Even worse, both of the tstatistics you have in mind assume that
the observations are independent. In a paired ttest, one assumes
that the *differences* are independent. In a twosample test, one
assumes that all n1+n2 observations are independent of one another.
You have created residuals which are (by construction) all
interrelated.
You have no independent observations here, and you shouldn't be
considering a basic ttest.
So, step back. Write to SASL (not to me personally) and explain
why you are doing this, and what you hope to achieve. The big picture
would be helpful. Perhaps someone here can point you toward a more
productive approach.
BTW, with sample sizes like you have, your statistical tests will
be really flaky, since the size of n will drive virutally anything
to appear significant. Why do you have such large samples, and where
do they come from, and what do they represent?
HTH,
David

David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
