LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 1998, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Jul 1998 10:40:14 -0600
Reply-To:     Mark S Dehaan <MSD@INEL.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Mark S Dehaan <MSD@INEL.GOV>
Subject:      Re: Will the real MEDIAN please stand up?
Comments: To: erickson@cmu.edu
Comments: cc: SAS-L <SAS-L@inel.gov>
Content-type: text/plain; charset=us-ascii

Michael,

You are right that this will give you a median. I, like John W., had never thought of the median like this, but as you said, it kinda makes it easier to visual the mean and median in these terms. I guess then, that when one does a regression that minimizes the abs. deviation (instead of squared deviation), that one is fitting a line thru a sort of median, and this is why it is a more robust regression technique. Thanks for the interesting insight! Regards, Mark DeHaan

"Michael A. Erickson" <miericks@maxwell.psy.cmu.edu> on 07/25/98 12:55:39 AM

Please respond to erickson@cmu.edu

To: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>, Mark S Dehaan/MSD/LMITCO/INEEL/US cc: Subject: Re: Will the real MEDIAN please stand up?

Mark S Dehaan <MSD@INEL.GOV> wrote: > BTW, you state > >If, on the other hand, I define the median as the value that minimizes > >the summed absolute deviation from the scores, > This definitely is not the definition of the median (although for perfectly > symmetrical distribution it is the same). > Imagine if the data had one hugely remote outlier. You would not want your > median to be effected by the amount of > this points noncentrality, yet your summed absolute deviation would be > greatly affected.

Hmm. I think you're right that the summed absolute deviation would be greatly affected, but I'm not sure that the value that minimizes it changes as outliers increase. For example, take the distribution 1, 2, 3 and the distribution 1, 2, 100. I think this second distribution satisfies your hugely remote outlier criterion.

I want to choose the number, m, in both cases that minimizes

sum |m - x_i|

I tried this on a spreadsheet -- asking what if m were 1,...,10, 99, 100. If m=1 in the first distribution, the sum is (0+1+2)=3, if m=2, it's (1+0+1)=2, if m=3 it's (2+1+0)=3, and from there on the line has a slope of 3. If m=1 in the second distribution, the sum is (0+1+99)=100, if m=2 it's (1+0+98)=99, if m=3 it's (2+1+97)=100, and now the line has a slope of 1 until you get to 100. Past 100, the slope jumps to 3.

So, even those the sum itself is much different for the two distributions, the minima are at the same point (m=2). Now, I know that does not constitute a proof, but it seemed a clear example of what I thought you meant.

I hope this kind of discussion is appropriate for this group. I haven't taken the time to read the group long enough to learn what is appropriate and what isn't (except for humorous programming tips involving Biblical figures).

\MaE

Michael A. Erickson erickson@cmu.edu


Back to: Top of message | Previous page | Main SAS-L page