Date: Mon, 22 May 2006 14:02:53 -0400
Reply-To: Kevin Roland Viel <kviel@EMORY.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Kevin Roland Viel <kviel@EMORY.EDU>
Subject: Re: Q: Outliers in regression analysis : Big problem?
In-Reply-To: <20060522172158.72134.qmail@web26402.mail.ukl.yahoo.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Mon, 22 May 2006, adel F. wrote:
> Hi,
> Usually in linear regression, people pay attention to the distribution of the residual or dep variable.
> How important the question of outliers? And is there a rule of thumb to deal with these particular points?
>
> Among the assumptions of linear regression which are the most important ?
From my recent experience and the anecdotes or others, they may be
supremely important.
The next time I approach my analyses, I am going to try to identify
observations of interest beforehand.
I have had a recent exchange on SAS-L concerning this issue, for instance
with the subject: Studentized residuals.
As you may see, Dale McLerran sagely points out that the critical value
should be adjusted for multiple test. In my most recent analyses, not yet
published thus not yet through peer review, I used both a liberal and
conservative cut-off. The liberal cut-off identified very interesting
points. Certainly, this is not a standardized approach and may not be
tenable.
As far as a rule of thumb goes, you might try the ROBUSTREG procedure,
which David Cassell suggested in the abovementioned exchange. It
should be the most parsimonious, meaning that several different analysts
may arrive at the same results.
It also depends on what you consider an outlier. For instance, consider
y=(x-1)/x, where x gt 0. You may approximate the relationship as linear
within certain range(s) of x. If however, you also have a few point well
beyond a given range, and only a few points, it may appear as if they
are outliers, because the certainly won't fit the linear model.
Without a doubt, more data is desirable. However, treating those points
as outliers and excluding them may serve you well. Undoubtedly, you will
warn your readers of the dangers of extrapolation. You would also be wise
to know your data and how others have explored it, including using
splines, partial residual plots, theoretical expectations, etc.
Good luck,
Kevin
Kevin Viel
Department of Epidemiology
Rollins School of Public Health
Emory University
Atlanta, GA 30322
|