LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2000, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 13 Jan 2000 13:45:45 -0800
Reply-To:   David Cassell <cassell@MERCURY.COR.EPA.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   David Cassell <cassell@MERCURY.COR.EPA.GOV>
Organization:   OAO Corp.
Subject:   Re: outliers
Content-Type:   text/plain; charset=us-ascii

ray wrote: > Paige Miller wrote: > > No. In fact, there is no generally agreed upon method for identifying > > outliers, not is there any way to decide whether or not they should be > > dropped without using subject matter knowledge. > > Dear Paige: Thanks for the answer and I absolutely agree with it. In my > case, I am simply trying to replicate a previous result which dropped obs > for certain variable values that were greater than 3 stds from the mean.

[I re-arranged and trimmed so it was easier to read - blame me if anything is amiss.]

Ray, if all you want to do is check whether values are within 3 sd of the sample eman, you can do that with a PROC MEANS and a DATA step:

PROC MEANS DATA=yourdata NOPRINT; VAR yourvar; OUTPUT OUT=OUTVAR MEAN=SAMPMEAN STD=SAMPSTD; RUN;

DATA NEW; RETAIN SAMPMEAN SAMPSTD; IF _N_=1 THEN SET OUTVAR(KEEP = SAMPMEAN SAMPSTD); IF ABS(yourvar - SAMPMEAN) > 3*SAMPSTD THEN DELETE; RUN;

I think that's what you asked for. But that's fairly naive, and may have any number of drawbacks [as you agreed above]. If you decide to evaluate the performance of said previous result, you may want to look at some papers on outlier detection, like:

Rosner 1975 Technometrics #17 Tietjen & Moore 1972 Technometrics #14 Walsh 1950 Annals of Math. Stat. #21 Walsh 1958 Annals of the Inst. of Stat. Math #10

Those are the ones I found taking a fast look in my reference lists, but this is hardly comprehensive. The bottom line: if you have a mixture of distributions, nothing may do the job well.

David -- David Cassell, OAO cassell@mail.cor.epa.gov Senior Computing Specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page