Date: Mon, 20 Oct 2003 14:40:53 -0700
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: reading datafile into macro?
Content-type: text/plain; charset=US-ASCII
"DePuy, Venita" <depuy001@DCRI.DUKE.EDU> wrote [in part]:
> I want to do a regression on 2 variables, their squares, and their
> interactions
> (which hopefully later can be generalized to "n" variables in
this
> and other locations);
> Then I want to output the parameter estimates via ODS, look at the
data
> file and drop those who are insignificant,
> then do a regression on the remaining variables.
>
> The best idea so far (I think) is to somehow read the variable names
from
> the data file into a string, then use that in the model.
> So, questions:
> 1) Is this feasible?
> 2) Any ideas how?
[1] yes, this is feasible. Lots of people have given you some good
coding suggestions.
[2] How? My recommendation is DON'T!!! I have written _ad_nauseam_
about
the perils of stepwise regression, and what you are proposing is even
*worse*
than classical stepwise regression. A model such as what you have
suggested
is prone to every statistical issue in the book. Suppressor variables.
Multicollinearity. Unseen factors. Missing variables. Measurement
errors.
Non-linearities. Violations of the standard regression assumptions.
You
name it. You can look up some of my rambling, foaming-at-the-mouth
diatribes
on this in past months of SAS-L, if you are so inclined. For those poor
people who have had to put up with me in the past, I shall refrain from
going on for another 50 lines on the subject.
Trying to fit a regression on P independent variables to weed out the
'unimportant' ones and keep only the 'important' ones is fraught with
peril.
Kruskal has written on the problems with 'relative importance' in this
sort
of linear model setting. And if you want this to be an automated tool,
then
there will never be adequate diagnostic analysis at the back end. So I
worry that this is just asking for trouble.
If you also are concerned about transforms of the independent or
dependent
variables, in addition to the above, then you are adding even more
problems
into the mix. But, rather than trying to fit a host of transforms,
consider
using the features of SAS to look at something like Box-Cox
transformations
instead.
Yes, I know I am being a grumpy old curmudgeon about this. But I *am*
a grumpy old curmudgeon. Just ask anyone on the list.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|