LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 1998, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 24 Jul 1998 13:04:35 -0600
Reply-To:     Mark S Dehaan <MSD@INEL.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Mark S Dehaan <MSD@INEL.GOV>
Subject:      Re: Will the real MEDIAN please stand up?
Comments: To: Michael A Erickson <erickson+@ANDREW.CMU.EDU>
Comments: cc: SAS-L <SAS-L@inel.gov>
Content-type: text/plain; charset=us-ascii

Michael,

I agree with what you understood a median to be from "every statistics book I had read". Not having Gravetter and Wallnau on my bookshelf, but from what you say about it, I would strongly disagree with its definition of median. It sounds like they are taking "binned" data and trying, post facto, to make it continuous again. This is not recommended - something about making a silk purse out of a sows ear comes to mind. You can not make the parameter have a much higher resolution (# signif digits) than the data it's coming from. So I would suggest sticking with your first definition as SAS's output.

BTW, you state >If, on the other hand, I define the median as the value that minimizes >the summed absolute deviation from the scores, This definitely is not the definition of the median (although for perfectly symmetrical distribution it is the same). Imagine if the data had one hugely remote outlier. You would not want your median to be effected by the amount of this points noncentrality, yet your summed absolute deviation would be greatly affected.

HTH, Mark DeHaan

Michael A Erickson <"erickson+"@ANDREW.CMU.EDU> on 07/24/98 08:46:45 AM

Please respond to Michael A Erickson <"erickson+"@ANDREW.CMU.EDU>

To: SAS-L@UGA.CC.UGA.EDU cc: (bcc: Mark S Dehaan/MSD/LMITCO/INEEL/US) Subject: Will the real MEDIAN please stand up?

Up until yesterday, every statistics book I had read (5 or so--I'm really good at reading the early chapters) had said that the median of a set of numbers could be calculated by ordering them, and reporting the middle one or the midpoint of the middle two.

Yesterday, a friend of mine showed me a book he taught from by Gravetter and Wallnau (1992) Statistics for the Behavioral Sciences: A first course for students of psychology and education. 3rd ed. St. Paul, MN: West.

It says that when there are several scores of the same value in the middle of a distribution you have to interpolate to obtain the true median. The logic behind this is that scores aren't actual scores; they are intervals (i.e., 4 doesn't indicate 4; it indicates some value between 3.5 and 4.5 if your degree of accuracy is units).

So, according to this book, if you have the following set of scores:

1, 2, 2, 3, 4, 4, 4, 4, 4, 5

the median is 3.70. This makes sense if one envisions the median as dividing the area of the histogram in half.

If, on the other hand, I define the median as the value that minimizes the summed absolute deviation from the scores, I return to wanting the median to be 4. I know that I'm switching back to scores instead of intervals, but I couldn't figure out how to minimize summed absolute deviations from *intervals*--although I bet this would turn up 3.7.

In any case, SAS in proc univariate, proc npar1way, and proc fastclus computes the 4.0 median throughout. Is this what statisticians assume when they're formulating e.g. non-parametric tests? Or is SAS just hoping that it won't make a difference in the long run?

Is SAS using the *real* median or should it be computing the Gravetter and Wallnau median?

\MaE

Michael A. Erickson erickson@cmu.edu


Back to: Top of message | Previous page | Main SAS-L page