Date: Mon, 22 Mar 1999 13:27:27 +0000
Reply-To: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Subject: SAS/STAT Stepwise selection
In-Reply-To: <1E362FD41C5@violet.le.ac.uk>
B. Manktelow <bm18@LEICESTER.AC.UK> writes
>Hi,
>I am hoping that somebody can suggest a method for using automatic
>variable selection in procedures that do not have the CLASS option
>(eg LOGISTIC or PHREG).
>The problem a colleague of mine has is that she has created three
>dummy variables for a factor with four levels. However, she requires
>that they are recognised as being the some variable and are therefore
>added or removed from the model together. SAS appears to
>treat each dummy variable as a seperate variable.
>Any suggestions on a way around this? We can't find anything in the
>documentation.
>(NOTE:
>1. We are fully aware of all of the dangers of automatic variable
>selection etc;
>2. I am a statistician so please be gentle with me!!!)
>
>Thanks
>Brad
>bm18@le.ac.uk
Would it be possible to reconsider those "level"s - ?
If like age (of anything) there is an underlying continuous variable
which is converted into dummy variables for just this logistic exercise,
then you may achieve just what you were looking for by generating those
dummy vars with a "less-discrete" nature.
up to 18
18 to 30
30 to 50
50 to 65 etc
may offer an obvious set of dummy variables, but they have to be taken
into procedures like logistic, as a complete group.
But if those ranges were not defined as discrete, they could become
independant dummy vars;
for example if these categories
under 18, under 30, under 50, under 65
are encoded in dummy vars, then, because they each divide the full range
at one point, they would be independant, and can come or go in the
stepwise selection process without requiring their partners.
You may want to modify the exercise by replacing these categorical
dummies with their opposite numbers
over18, over30, over50, over65
depending on the data model and objectives for the analysis.
Not being adequately versed in the speciality, I'm not certain their
effect is different or conversely, at risk of introducing noncolinearity
when both sets are combined
Is this already a standard approach,
or invalidated for any reason ?
--
Peter Crawford
|