Date: Fri, 24 Jul 1998 13:04:35 -0600
Reply-To: Mark S Dehaan <MSD@INEL.GOV>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Mark S Dehaan <MSD@INEL.GOV>
Subject: Re: Will the real MEDIAN please stand up?
Content-type: text/plain; charset=us-ascii
Michael,
I agree with what you understood a median to be from "every statistics book
I had read". Not having Gravetter and Wallnau on my bookshelf, but from
what you say about it, I would strongly disagree with its definition of
median. It sounds like they are taking "binned" data and trying, post
facto, to make it continuous again. This is not recommended - something
about making a silk purse out of a sows ear comes to mind. You can not
make the parameter have a much higher resolution (# signif digits) than the
data it's coming from. So I would suggest sticking with your first
definition as SAS's output.
BTW, you state
>If, on the other hand, I define the median as the value that minimizes
>the summed absolute deviation from the scores,
This definitely is not the definition of the median (although for perfectly
symmetrical distribution it is the same).
Imagine if the data had one hugely remote outlier. You would not want your
median to be effected by the amount of
this points noncentrality, yet your summed absolute deviation would be
greatly affected.
HTH,
Mark DeHaan
Michael A Erickson <"erickson+"@ANDREW.CMU.EDU> on 07/24/98 08:46:45 AM
Please respond to Michael A Erickson <"erickson+"@ANDREW.CMU.EDU>
To: SAS-L@UGA.CC.UGA.EDU
cc: (bcc: Mark S Dehaan/MSD/LMITCO/INEEL/US)
Subject: Will the real MEDIAN please stand up?
Up until yesterday, every statistics book I had read (5 or
so--I'm really good at reading the early chapters) had said that
the median of a set of numbers could be calculated by ordering
them, and reporting the middle one or the midpoint of the middle
two.
Yesterday, a friend of mine showed me a book he taught from by
Gravetter and Wallnau (1992) Statistics for the Behavioral Sciences: A
first course for students of psychology and education. 3rd
ed. St. Paul, MN: West.
It says that when there are several scores of the same value in the
middle of a distribution you have to interpolate to obtain the true
median. The logic behind this is that scores aren't actual scores;
they are intervals (i.e., 4 doesn't indicate 4; it indicates some
value between 3.5 and 4.5 if your degree of accuracy is units).
So, according to this book, if you have the following set of scores:
1, 2, 2, 3, 4, 4, 4, 4, 4, 5
the median is 3.70. This makes sense if one envisions the median as
dividing the area of the histogram in half.
If, on the other hand, I define the median as the value that minimizes
the summed absolute deviation from the scores, I return to wanting the
median to be 4. I know that I'm switching back to scores instead of
intervals, but I couldn't figure out how to minimize summed absolute
deviations from *intervals*--although I bet this would turn up 3.7.
In any case, SAS in proc univariate, proc npar1way, and proc fastclus
computes the 4.0 median throughout. Is this what statisticians assume
when they're formulating e.g. non-parametric tests? Or is SAS just
hoping that it won't make a difference in the long run?
Is SAS using the *real* median or should it be computing the Gravetter
and Wallnau median?
\MaE
Michael A. Erickson
erickson@cmu.edu