LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (November 2001, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 16 Nov 2001 23:23:57 +0000
Reply-To:   Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Michael Friendly <friendly@HOTSPUR.PSYCH.YORKU.CA>
Organization:   York University
Subject:   Re: shorthand in sas reg model?

In article <OFBB9DEA8A.B145D434-ON88256B06.00711ACE@rtp.epa.gov> Cassell.David@EPAMAIL.EPA.GOV (David L. Cassell) writes: |kataliu <richardliu@NORTHWESTERN.EDU> wrote: |> I am new to sas. | |I wonder if you are also new to inferential statistics, based on: | |> When I use the regression model in sas, I find that |> I have lots of independent variables. Therefore, I |> have to type each one in sas codes. |> |> For example, |> |> PROC REG DATA=...; |> MODEL TARGET = ACOL1 MTGAT DKL ...../SELECTION = STEPWISE; |> ^^^^^^^^^^^^^^^^^^^^^ |> over 200 variables with different name!! |> RUN; | |This is a problem waiting to happen. That many variables and a |stepwise selection procedure will help you to fit.. well.. most likely |a lot of garbage. Measurement error, collinearity, non-interpretability |of 'relative importance', and a host of other issues will plague you. |With 200 variables, you could have 200 sources of random noise and |you'll probably get this to fit a swell model for you [where 'swell' |is unspecified]. Such a thing has happened - and been printed for all |to see in the literature - more times than you want to know.

Below is a simple, but effective teaching example I have used for a while to demonstrate the perils of blind stepwise selection --- generate 100 N(0,1) predictors, and an independent N(0,1) y. Toss them into stepwise selection, and -- hey-- you can get an R^2 of .25 or maybe greater. But, generate two similar samples, and use the model selected by each to cross-validate the other-- whoa-- the R^2 drops to non-signifcance.

The code below depends on how the seed for the normal() function is used on your machine. I ran the first reg step once, then use the variables selected in stepwise for the last step. Another useful variation is to add 100 random N(0,1) X1-X100 predictors to a real model. Students are amazed at how often the X variables turn up among the ``real predictors''

----- stepsim.sas ---- title 'Stepwise simulation example - NO real predictors'; * Generate two sets of data: 100 random predictors, 200 observations;

data sim; array x{100} x1-x100; do testset= 1 to 2; do n=1 to 200; *-- generate the predictors-- all independent, just noise; do i=1 to 100; x(i) = normal(6752343); end; *-- generate the criterion-- no relation to any of the Xs; y = normal(7654321); output; end; end;

proc reg; by testset; model y = x1-x100 / selection=forward slentry=.05; run;

/* Now see how well each prediction equation does in the other data set. - Each model should do well on the model for which it was selected, but poorly on the other set of data */

title2 'Testing cross-validation'; proc reg data=sim; by testset; M1: model y = x13 x75 x5 x25 x82 x10 x38 x87 x94 x93 x29 x97; M2: model y = x78 x14 x30 x25 x9 x4;

-- Michael Friendly Email: friendly@yorku.ca (NeXTmail OK) Psychology Dept York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA


Back to: Top of message | Previous page | Main SAS-L page