LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (October 2003, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 20 Oct 2003 14:40:53 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: reading datafile into macro?
Content-type: text/plain; charset=US-ASCII

"DePuy, Venita" <depuy001@DCRI.DUKE.EDU> wrote [in part]: > I want to do a regression on 2 variables, their squares, and their > interactions > (which hopefully later can be generalized to "n" variables in this > and other locations); > Then I want to output the parameter estimates via ODS, look at the data > file and drop those who are insignificant, > then do a regression on the remaining variables. > > The best idea so far (I think) is to somehow read the variable names from > the data file into a string, then use that in the model. > So, questions: > 1) Is this feasible? > 2) Any ideas how?

[1] yes, this is feasible. Lots of people have given you some good coding suggestions.

[2] How? My recommendation is DON'T!!! I have written _ad_nauseam_ about the perils of stepwise regression, and what you are proposing is even *worse* than classical stepwise regression. A model such as what you have suggested is prone to every statistical issue in the book. Suppressor variables. Multicollinearity. Unseen factors. Missing variables. Measurement errors. Non-linearities. Violations of the standard regression assumptions. You name it. You can look up some of my rambling, foaming-at-the-mouth diatribes on this in past months of SAS-L, if you are so inclined. For those poor people who have had to put up with me in the past, I shall refrain from going on for another 50 lines on the subject.

Trying to fit a regression on P independent variables to weed out the 'unimportant' ones and keep only the 'important' ones is fraught with peril. Kruskal has written on the problems with 'relative importance' in this sort of linear model setting. And if you want this to be an automated tool, then there will never be adequate diagnostic analysis at the back end. So I worry that this is just asking for trouble.

If you also are concerned about transforms of the independent or dependent variables, in addition to the above, then you are adding even more problems into the mix. But, rather than trying to fit a host of transforms, consider using the features of SAS to look at something like Box-Cox transformations instead.

Yes, I know I am being a grumpy old curmudgeon about this. But I *am* a grumpy old curmudgeon. Just ask anyone on the list.

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page