LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 4 Mar 2003 12:54:42 -0500
Reply-To:     Peter Flom <flom@NDRI.ORG>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Peter Flom <flom@NDRI.ORG>
Subject:      Overfitting references
Comments: To: STAT-L@LISTS.MCGILL.CA
Content-Type: text/plain; charset=US-ASCII

Apologies for cross posting

Earlier today, a colleague told me she had read an article that used backward elimination in linear regression. There were 52 cases and 15 variables. (!) She asked me if that was a problem. I gave her an emphatic yes.

But it got me thinking.

I have seen some rules of thumb for 'number of cases per variable" in regression. But is there much empirical literature on how regressions perform in various combinations of N and number of IVs and different model selection methods?

e.g., one interesting idea is to use random data, and see how often different p-values are obtained with different N, different numbers of IVs and different selection methods.

Any pointers to existing literature would be appreciated

Thanks

Peter

Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)


Back to: Top of message | Previous page | Main SAS-L page