Date: Mon, 27 Jul 1998 10:40:14 -0600
Reply-To: Mark S Dehaan <MSD@INEL.GOV>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Mark S Dehaan <MSD@INEL.GOV>
Subject: Re: Will the real MEDIAN please stand up?
Content-type: text/plain; charset=us-ascii
Michael,
You are right that this will give you a median. I, like John W., had never
thought of the median like this, but as you said, it kinda makes it easier
to visual the mean and median in these terms. I guess then, that when one
does a regression that minimizes the abs. deviation (instead of squared
deviation), that one is fitting a line thru a sort of median, and this is
why it is a more robust regression technique. Thanks for the interesting
insight!
Regards,
Mark DeHaan
"Michael A. Erickson" <miericks@maxwell.psy.cmu.edu> on 07/25/98 12:55:39
AM
Please respond to erickson@cmu.edu
To: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>, Mark S
Dehaan/MSD/LMITCO/INEEL/US
cc:
Subject: Re: Will the real MEDIAN please stand up?
Mark S Dehaan <MSD@INEL.GOV> wrote:
> BTW, you state
> >If, on the other hand, I define the median as the value that minimizes
> >the summed absolute deviation from the scores,
> This definitely is not the definition of the median (although for
perfectly
> symmetrical distribution it is the same).
> Imagine if the data had one hugely remote outlier. You would not want
your
> median to be effected by the amount of
> this points noncentrality, yet your summed absolute deviation would be
> greatly affected.
Hmm. I think you're right that the summed absolute deviation would be
greatly affected, but I'm not sure that the value that minimizes it
changes as outliers increase. For example, take the distribution 1,
2, 3 and the distribution 1, 2, 100. I think this second distribution
satisfies your hugely remote outlier criterion.
I want to choose the number, m, in both cases that minimizes
sum |m - x_i|
I tried this on a spreadsheet -- asking what if m were 1,...,10, 99,
100. If m=1 in the first distribution, the sum is (0+1+2)=3, if m=2,
it's (1+0+1)=2, if m=3 it's (2+1+0)=3, and from there on the line has
a slope of 3. If m=1 in the second distribution, the sum is
(0+1+99)=100, if m=2 it's (1+0+98)=99, if m=3 it's (2+1+97)=100, and
now the line has a slope of 1 until you get to 100. Past 100, the
slope jumps to 3.
So, even those the sum itself is much different for the two
distributions, the minima are at the same point (m=2). Now, I know
that does not constitute a proof, but it seemed a clear example of
what I thought you meant.
I hope this kind of discussion is appropriate for this group. I
haven't taken the time to read the group long enough to learn what is
appropriate and what isn't (except for humorous programming tips
involving Biblical figures).
\MaE
Michael A. Erickson
erickson@cmu.edu
|