| Date: | Tue, 15 Aug 2000 17:12:07 -0400 |
| Reply-To: | Chris Smith <cpsmith@AGFINANCE.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Chris Smith <cpsmith@AGFINANCE.COM> |
| Subject: | Re: Chi-square question |
|
On Fri, 11 Aug 2000 10:49:16 +0200, Victor Bos <vic@TIK.NU> wrote:
>Hi group,
>
>At last: after 9+ years as a SAS developer, I am running into a statistical
>question.
>I am comparing two sets of data, containing customer information, and I
want
>to see if there is a significant difference between certain elements in the
>two datasets.
>I remember from my college-years, that to do that a Chi-square test can be
>used, so I startup proc freq with the CHISQ option to do the job. Now my
>problem:
>
>How do I interpret the results from proc freq?
>
>The documentation on proc freq is very limited on this. Could someone of
you
>statisticians please explain how I achieve my goal with proc freq/chisq??
Or
>can I better use another procedure or statistical test? For completeness, I
>have attached my proc freq output, in which I would like to decide whether
>there is a significant difference in my two testsets (TC=0 and TC=1) for
>each value of BC.
>
>can anyone help me out here??
>thanks in advance,
>
>Victor Bos
>Talkline Nederland BV,
>the Netherlands.
>
> TABLE OF BC BY TC
>
> BC TC
>
> Frequency|
> Col Pct | 0| 1| Total
> ---------+--------+--------+
> - | 1 | 0 | 1
> | 0.00 | 0.00 |
> ---------+--------+--------+
> 1 | 7934 | 237 | 8171
> | 10.08 | 9.13 |
> ---------+--------+--------+
> 2 | 8558 | 243 | 8801
> | 10.87 | 9.36 |
> ---------+--------+--------+
> 3 | 8124 | 202 | 8326
> | 10.32 | 7.78 |
> ---------+--------+--------+
> 4 | 27383 | 850 | 28233
> | 34.80 | 32.73 |
> ---------+--------+--------+
> 5 | 8089 | 206 | 8295
> | 10.28 | 7.93 |
> ---------+--------+--------+
> 6 | 8890 | 411 | 9301
> | 11.30 | 15.83 |
> ---------+--------+--------+
> 7 | 9716 | 448 | 10164
> | 12.35 | 17.25 |
> ---------+--------+--------+
> Total 78695 2597 81292
>
>
> STATISTICS FOR TABLE OF BC BY TC
>
> Statistic DF Value Prob
> ------------------------------------------------------
> Chi-Square 7 133.665 0.001
> Likelihood Ratio Chi-Square 7 126.890 0.001
> Mantel-Haenszel Chi-Square 1 71.981 0.001
> Phi Coefficient 0.041
> Contingency Coefficient 0.041
> Cramer's V 0.041
>
> Sample Size = 81292
First, you need to declare the missing values. To interpret the
crosstabulation you should also request the expected cell counts. This
should help you to visualize where the differences are. Remember though,
that the entire relationship between the observed and expected counts is
what is significant or not.
Also, you might consider taking a sample of this data. Chi Square loses a
great deal of power with such large populations, such that it is almost
always highly significant. A sample of a thousand or two should be
sufficent. Or consider a different test procedure that is not affected by
large samples.
Hope this helps you to blind them with science.
|