LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 22 May 2006 14:02:53 -0400
Reply-To:     Kevin Roland Viel <kviel@EMORY.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Kevin Roland Viel <kviel@EMORY.EDU>
Subject:      Re: Q: Outliers in regression analysis : Big problem?
In-Reply-To:  <20060522172158.72134.qmail@web26402.mail.ukl.yahoo.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Mon, 22 May 2006, adel F. wrote:

> Hi, > Usually in linear regression, people pay attention to the distribution of the residual or dep variable. > How important the question of outliers? And is there a rule of thumb to deal with these particular points? > > Among the assumptions of linear regression which are the most important ?

From my recent experience and the anecdotes or others, they may be supremely important.

The next time I approach my analyses, I am going to try to identify observations of interest beforehand.

I have had a recent exchange on SAS-L concerning this issue, for instance with the subject: Studentized residuals.

As you may see, Dale McLerran sagely points out that the critical value should be adjusted for multiple test. In my most recent analyses, not yet published thus not yet through peer review, I used both a liberal and conservative cut-off. The liberal cut-off identified very interesting points. Certainly, this is not a standardized approach and may not be tenable.

As far as a rule of thumb goes, you might try the ROBUSTREG procedure, which David Cassell suggested in the abovementioned exchange. It should be the most parsimonious, meaning that several different analysts may arrive at the same results.

It also depends on what you consider an outlier. For instance, consider y=(x-1)/x, where x gt 0. You may approximate the relationship as linear within certain range(s) of x. If however, you also have a few point well beyond a given range, and only a few points, it may appear as if they are outliers, because the certainly won't fit the linear model.

Without a doubt, more data is desirable. However, treating those points as outliers and excluding them may serve you well. Undoubtedly, you will warn your readers of the dangers of extrapolation. You would also be wise to know your data and how others have explored it, including using splines, partial residual plots, theoretical expectations, etc.

Good luck,

Kevin

Kevin Viel Department of Epidemiology Rollins School of Public Health Emory University Atlanta, GA 30322


Back to: Top of message | Previous page | Main SAS-L page