Date: Sun, 18 Jan 2004 09:01:34 -0500
Reply-To: Peter Flom <flom@NDRI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <flom@NDRI.ORG>
Subject: Re: R^2 when running proc REG under /noint option:
howinterpretingthe output?
Content-Type: text/plain; charset=US-ASCII
Luigi
In general, it is a bad idea to choose your model based on R squared,
p-values,
stepwise selection, or similar methods. You should choose your model
because it makes sense. Admittedly, a lot of people choose models based
on things like maximizing R squared; there have been numerous occasions
here on SAS-L where
this practice was condemned (try searching the archives based on words
like 'stepwise').
In this particular case, eliminating the intercept only makes sense if
you KNOW, substantively, that the DV MUST be 0 when all the IVs are 0.
I could see this might happen in some of the hard sciences; it does not
appear realistic when talking about
things like bank debt and crime. That is, even in a place with no crime
(yeah, right)
there would still be bank debt; so, eliminating the intercept makes no
sense.
As for collinearity, one option is to center the IVs - this is
controversial with arguments by prominent people both for and against
centering. But if the paper is due Tuesday, you don't really have time
to get into that literature. One source to start with, if you are still
interested, is Belsley's book titled (something like) collinearity and
weak data in regression (I don't have the exact title - but that should
be searchable)
HTH
Luigi wrote
<<<
Thanks Peter.
I'm analysing the relationship between banks bad debts in Italy and
the territorial distribution of a series of crime indexes. Some set of
regressors works fine only excluding the intercept... actually I must
hand my paper to my professor next tuesday and just yesterday I
discovered this problem... I prepared all my paper on the assumption
that the highest R**2 was my goal... but I was *a little* silly...
You're right, my variables are higly collinear with the intercept...
so I put it out of my model but in this case R**2 has a different
meaning and comparing the SSError with the SSTotal of the model with
the intercept I should say that there's no improvement in eleminating
the intercept, even if its parameter is not significant...
So I need to understand if a model with no intercept and R**2=.91 is
better or not than the same model with the intercept and R**2=.75.
>>>
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)
|