```Date: Tue, 4 Aug 2009 09:08:33 -0500 Reply-To: Robin R High Sender: "SAS(r) Discussion" From: Robin R High Subject: Re: Question on Comparing Two Averages Comments: To: jasonm@UCLA.EDU In-Reply-To: <20090803144245.48204wl8x7w4w2q8@mail.ucla.edu> Content-Type: text/plain; charset="US-ASCII" Jason, It turns out the difference in the marginal proportions (20% and 22% for respective years) has the same difference as the percents in the off diagonal elements in the table structure I described yesterday -- in order to make that test is the need to know the off-diagonal counts. What is potentially very different is that treating the original data with matching (if you had it) gives a more powerful test of the difference (much like a paired t-test gives a smaller pvalue with positive covariance) than treating the marginal totals, so it is quite possible (very likely?) to detect a significant difference of 2% with matched data with a sample of 5000, whereas it is not as clear comparing the percent "yes" for each question across years .. though always need to ask if a 2% difference really meaningful? Alan Agresti in his "intro to Categorical Data Analysis" 2nd ed, Chapter 8.1 has a nice illustration of paired categorical data, illustrating both approaches. Also, Paul Alison in his book on "Fixed Effects" in Chapter 3 describes it as the "population averaged" vs "subject specfic" estimates. Robin High UNMC J M Sent by: "SAS(r) Discussion" 08/03/2009 04:43 PM Please respond to jasonm@UCLA.EDU To SAS-L@LISTSERV.UGA.EDU cc Subject Re: Question on Comparing Two Averages > > HOWEVER, with n = 4,500, a tiny difference will be statistically > significant. Will it be of any importance? So, if I have 20% respond "yes" in 2008 and then 22% respond "yes" in 2009 I can say that due to the large sample the difference would most likely be statistically significant if we had the correct data to actually test this claim? Quoting Peter Flom : > J M wrote >> I'm wondering what people's thoughts are on this: >> A group of people were tested in 2008 and asked to reply yes or no to >> a series of questions (n=5,000). >> They were tested again in 2009 and asked to reply yes or no to the >> same series of questions (n=4,500). >> There is no raw data. The response rates for each of the questions in >> 2008 and 2009 is unknown. We just know how many people answered yes to >> each of the questions in 2008 and 2009. >> Although this would be a very crude analysis, would the results of a >> dependent t-test comparing the two averages on each of the questions >> to see if the difference is significantly different from zero between >> 2008 and 2009 mean anything? > > How would you do a dependent t-test? To do that you would need to > know who got how many answers > right, and you say you don't know that. You'll have to do the much > less powerful independent sample t-test. > > HOWEVER, with n = 4,500, a tiny difference will be statistically > significant. Will it be of any importance? > > You don't say how many questions were asked, or what the > distributions are, so it's hard to even give an example > that would be useful. > > You also don't say why you are doing this. You don't say what the > questions are, or why you are comparing them. > > PLUS, the bit about "response rates are unknown" makes it sound like > you did some kind of survey. If you don't know the > response rates, then nothing you do will be of much value (sorry). > > Peter > > > Peter L. Flom, PhD > Statistical Consultant > www DOT peterflomconsulting DOT com > http://www.associatedcontent.com/user/582880/peter_flom.html > ```

Back to: Top of message | Previous page | Main SAS-L page