Date: Tue, 6 May 2008 14:38:27 -0500
Reply-To: Suhong Tong <sophidt@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Suhong Tong <sophidt@HOTMAIL.COM>
Subject: Re: Basic stat question in large scale categorical data analysis
Content-Type: text/plain; charset="Windows-1252"
Peter, Thank you for your response and questions that lead further discussion.
This is a service survey without survey sampling and design, we ask standard questions and collect data as people call, so I think it's more of a observational study. At certain time period(I do have defined month and year) that service provider offer free gift or media campaign target specific population such as different age group or ethnicity, we want to see whether these activities make difference in the outcomes we are looking for. The demographics is just part of our analysis to describe the population.
As usual, we alway do descriptive statistics before we go further to regression or modeling and what I posted here is the first step of analysis. Assume I did it right, my next question is: If I want to compare period 1 vs 2 and 2 vs 3, is PROC FREQ with WHERE clause to select sub-population (i.e. WHERE PERIOD in (1, 2)) a valid method or I need use logistic regression to test difference between time periods? I have >20 such variables to test, really want to select a method that is valid yet simple, so I can use macro.
Any suggestions? your opinion is greatly apprciated.
> Date: Tue, 6 May 2008 14:12:35 -0400
> From: email@example.com
> To: sophidt@HOTMAIL.COM; SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Basic stat question in large scale categorical data analysis
> Sophia Tong <sophidt@HOTMAIL.COM> wrote
> > I am trying to pull desciptive statistics from a large observational study.
> >Almost all of the variables are categroical by their nature or recoded as
> >categorical. I am looking for the demographics for this population in 3
> >different time period, so the table would be PERIOD*VAR in RxC format(not a
> >2x2 table). The sample size of sub-population in each time period is great
> >than 10,000. So almost every variable I tested are significantly different
> >overall. As I see more and more Prob <.0001, my excitement fated away. Am I
> >using a right method? What I did is simple PROC FREQ with request of
> >Chi-Square test.
> > PROC FREQ data=large;
> > tables PERIOD*ETHNGRP/Chisq;
> > run;
> > Any comments or suggestions?
> Whether the method you are using is correct depends on what you are trying to find out. The method you are using tests whether the demographic variables are different in the different time periods. Is that what you want? Do you want to look at trends, instead? Perhaps you want to see if PERIOD 'explains' any of the variables..... then you would want regression. But why are you looking at time as a period? Do you have the actual year or month or whatever?
> But, assuming this is the method you want, you are seeing how silly the p-value is. What you want to report on is effect size. Here you have ETHNGRP, presumably some ethnic grouping. You might want to look at how different they are. The p-value tells you that, if they were really the same in the population, then you are unlikely to get values like these. But that is rarely a relevant question.
> Of course, if you have a survey, then you should probably be using SURVEYFREQ.....
> Peter L. Flom, PhD
> Statistical Consultant
> www DOT peterflom DOT com
Get Free (PRODUCT) REDô Emoticons, Winks and Display Pics.