Date: Tue, 9 Dec 2003 15:15:10 -0500
Reply-To: "DePuy, Venita" <depuy001@DCRI.DUKE.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "DePuy, Venita" <depuy001@DCRI.DUKE.EDU>
Subject: Re: Wilcoxon normal- and t-approxiamtion
Content-Type: text/plain
Hi Louise:
The difference between the z and t approximations in the Npar1way output is
the continuity correction (included in z, not in t).
That's basically because you're using the large sample approximation to
normality (a continuous distribution) but starting with a discrete
distribution; the continuity correction adds a little in to make up for
that. The correction decreases the numerator, making the outcome more
conservative.
Personal experience - the difference between the z and t is usually < .01.
Also, many packages don't offer the continuity correction.
One way around it is to choose the 'exact' option, which is VERY
computationally intensive but doesn't use any approximations. Only for very
small sample sizes though!
Wilcoxon assumptions:
There are three primary assumptions of the Wilcoxon-Mann-Whitney test:
1) Each sample is randomly selected from the specific population, and the
observations within each sample are independent and identically distributed.
2) The two samples are continuous and independent of each other. (If
populations are not independent, consider Wilcoxon signed rank test).
3) The populations may differ in their location (mean or median), but not in
their distributional shape or spread. (If this assumption is questionable,
consider the Lepage or Kolmogorov-Smirnov tests).
Also, a key point - the spreads (variances) of the two populations NEED to
be the same or really similar; if they're not, the t test may be a better
option. (D. Zimmerman has several papers published along these lines).
Model details -
For exact calculations:
To compute the test statistic W, the combined sample of N = m + n X-values
and Y-values are ordered from least to greatest. Let S1 be the rank of the
lowest Y value, Y1, and Sn shall denote the rank of the highest Y value, Yn.
Any tied observations shall receive equal average values; for example, if
the third and fourth observations have the same value, they both receive the
rank of 3.5. W is the sum of the ranks assigned to the Y values. That
number is compared to a specific number, usually from tables in
Hollander&Wolfe or other texts if doing it by hand. SAS uses an algorithm
to generate the numbers and get p values.
Large Sample Approximations:
<<...OLE_Obj...>> and then W* is normally distributed; compare to
z(alpha) or z(alpha/2) depending on hypotheses to be tested.
(This formula does not include corrections for ties or continuity
corrections).
Hope this helps, let me know if there's details I haven't provided. (Just
finished writing an article on this topic).
-Venita
> ----------
> From: louise[SMTP:louise@SELSKABET.ORG]
> Reply To: louise
> Sent: Tuesday, December 09, 2003 2:53 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Wilcoxon normal- and t-approxiamtion
>
> Hi,
> Using Wilcoxons test I have trouble in choosing normal approximation
> or t-approxiamtion as the correct. What exactly do the approximations
> represent? Please give an anwser full of assumptions and
> model/algorithm details.
> Sincerely Louise
>
|