Date: Tue, 4 Aug 2009 09:08:33 -0500
Reply-To: Robin R High <rhigh@UNMC.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Robin R High <rhigh@UNMC.EDU>
Subject: Re: Question on Comparing Two Averages
In-Reply-To: <20090803144245.48204wl8x7w4w2q8@mail.ucla.edu>
Content-Type: text/plain; charset="US-ASCII"
Jason,
It turns out the difference in the marginal proportions (20% and 22% for
respective years) has the same difference as the percents in the off
diagonal elements in the table structure I described yesterday -- in order
to make that test is the need to know the off-diagonal counts. What is
potentially very different is that treating the original data with
matching (if you had it) gives a more powerful test of the difference
(much like a paired t-test gives a smaller pvalue with positive
covariance) than treating the marginal totals, so it is quite possible
(very likely?) to detect a significant difference of 2% with matched data
with a sample of 5000, whereas it is not as clear comparing the percent
"yes" for each question across years .. though always need to ask if a 2%
difference really meaningful?
Alan Agresti in his "intro to Categorical Data Analysis" 2nd ed, Chapter
8.1 has a nice illustration of paired categorical data, illustrating both
approaches. Also, Paul Alison in his book on "Fixed Effects" in Chapter 3
describes it as the "population averaged" vs "subject specfic" estimates.
Robin High
UNMC
J M <jasonm@UCLA.EDU>
Sent by: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
08/03/2009 04:43 PM
Please respond to
jasonm@UCLA.EDU
To
SAS-L@LISTSERV.UGA.EDU
cc
Subject
Re: Question on Comparing Two Averages
>
> HOWEVER, with n = 4,500, a tiny difference will be statistically
> significant. Will it be of any importance?
So, if I have 20% respond "yes" in 2008 and then 22% respond "yes" in
2009 I can say that due to the large sample the difference would most
likely be statistically significant if we had the correct data to
actually test this claim?
Quoting Peter Flom <peterflomconsulting@mindspring.com>:
> J M <jasonm@UCLA.EDU> wrote
>> I'm wondering what people's thoughts are on this:
>> A group of people were tested in 2008 and asked to reply yes or no to
>> a series of questions (n=5,000).
>> They were tested again in 2009 and asked to reply yes or no to the
>> same series of questions (n=4,500).
>> There is no raw data. The response rates for each of the questions in
>> 2008 and 2009 is unknown. We just know how many people answered yes to
>> each of the questions in 2008 and 2009.
>> Although this would be a very crude analysis, would the results of a
>> dependent t-test comparing the two averages on each of the questions
>> to see if the difference is significantly different from zero between
>> 2008 and 2009 mean anything?
>
> How would you do a dependent t-test? To do that you would need to
> know who got how many answers
> right, and you say you don't know that. You'll have to do the much
> less powerful independent sample t-test.
>
> HOWEVER, with n = 4,500, a tiny difference will be statistically
> significant. Will it be of any importance?
>
> You don't say how many questions were asked, or what the
> distributions are, so it's hard to even give an example
> that would be useful.
>
> You also don't say why you are doing this. You don't say what the
> questions are, or why you are comparing them.
>
> PLUS, the bit about "response rates are unknown" makes it sound like
> you did some kind of survey. If you don't know the
> response rates, then nothing you do will be of much value (sorry).
>
> Peter
>
>
> Peter L. Flom, PhD
> Statistical Consultant
> www DOT peterflomconsulting DOT com
> http://www.associatedcontent.com/user/582880/peter_flom.html
>
|