```Date: Mon, 27 Jul 1998 10:40:14 -0600 Reply-To: Mark S Dehaan Sender: "SAS(r) Discussion" From: Mark S Dehaan Subject: Re: Will the real MEDIAN please stand up? Comments: To: erickson@cmu.edu Comments: cc: SAS-L Content-type: text/plain; charset=us-ascii Michael, You are right that this will give you a median. I, like John W., had never thought of the median like this, but as you said, it kinda makes it easier to visual the mean and median in these terms. I guess then, that when one does a regression that minimizes the abs. deviation (instead of squared deviation), that one is fitting a line thru a sort of median, and this is why it is a more robust regression technique. Thanks for the interesting insight! Regards, Mark DeHaan "Michael A. Erickson" on 07/25/98 12:55:39 AM Please respond to erickson@cmu.edu To: "SAS(r) Discussion" , Mark S Dehaan/MSD/LMITCO/INEEL/US cc: Subject: Re: Will the real MEDIAN please stand up? Mark S Dehaan wrote: > BTW, you state > >If, on the other hand, I define the median as the value that minimizes > >the summed absolute deviation from the scores, > This definitely is not the definition of the median (although for perfectly > symmetrical distribution it is the same). > Imagine if the data had one hugely remote outlier. You would not want your > median to be effected by the amount of > this points noncentrality, yet your summed absolute deviation would be > greatly affected. Hmm. I think you're right that the summed absolute deviation would be greatly affected, but I'm not sure that the value that minimizes it changes as outliers increase. For example, take the distribution 1, 2, 3 and the distribution 1, 2, 100. I think this second distribution satisfies your hugely remote outlier criterion. I want to choose the number, m, in both cases that minimizes sum |m - x_i| I tried this on a spreadsheet -- asking what if m were 1,...,10, 99, 100. If m=1 in the first distribution, the sum is (0+1+2)=3, if m=2, it's (1+0+1)=2, if m=3 it's (2+1+0)=3, and from there on the line has a slope of 3. If m=1 in the second distribution, the sum is (0+1+99)=100, if m=2 it's (1+0+98)=99, if m=3 it's (2+1+97)=100, and now the line has a slope of 1 until you get to 100. Past 100, the slope jumps to 3. So, even those the sum itself is much different for the two distributions, the minima are at the same point (m=2). Now, I know that does not constitute a proof, but it seemed a clear example of what I thought you meant. I hope this kind of discussion is appropriate for this group. I haven't taken the time to read the group long enough to learn what is appropriate and what isn't (except for humorous programming tips involving Biblical figures). \MaE Michael A. Erickson erickson@cmu.edu ```

Back to: Top of message | Previous page | Main SAS-L page