Date: Fri, 9 Nov 2007 10:15:02 -0000
Reply-To: "cat.." <cat.b41@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "cat.." <cat.b41@GMAIL.COM>
Subject: Re: Univariate tests before multivariate modeling in logistic
Content-Type: text/plain; charset="us-ascii"
On Nov 8, 5:44 pm, peterflomconsult...@mindspring.com (Peter Flom)
> "cat.." <cat....@GMAIL.COM> wrote
> >I'd like to get your opinion of statistician about one think that
> >suprised me in a publication I read.
> >It suggests a strategy for fitting a multivariate logistic regression.
> >- Identification of a primary set of covariates (risk factors)
> >- Performing of univariate testings:
> > Covariate * exposition in disease free subjects --> p-value1
> Your subject line indicates univariate tests before multivariate modeling, but in the body of the message, I don't see any multivatiate modeling at all.
> There isn't anything wrong with doing univariate testing (I would call it bivariate, but no big deal) before multivariate. Exploring your data in multiple ways is a good idea. But there is something wrong with using bivariate screening as a variable selection tool. For one thing, a variable might be important only after controlling for another variable.
> Could you provide some more details on what the authors did?
> How did they identify a set of covariates?
> What univariate (bivariate) testing did they do?
> How many candidate independent variables were there?
> What sample size?
> What is the state of theory about the relationship between the DV and the IV? If there is strong theory, then the approach will be different than if the research is more exploratory
> A good book on this is Frank Harrell's Regression Modeling Strategies
> Other areas to explore are partial least squares, principal component regression, the lasso, least angle regression, and multimodel averaging
> Hope this helps
Thank you for your contribution. Actually, I inadvertently clicked on
the button "Send" before I completed my message and had no time to
send a new one afterwards.
Here we go now.
The paper is a French paper by Jean Bouyer: Logistic Regression in
Epidemiology, Part II. Revue d'épidémiologie et de santé publique.
1991. 39: 183-196.
The objective is to estimate the strenght of the association between
an exposition factor and a disease, adjusting for confounding factors.
So, the shape of the model is P ( D = 1) = E + Sum (Xi) + Sum (E*Xi),
D = Disease
E = Exposition factor.
Xi = confounding covariate
E*Xi = interaction
Potential confounding factors are picked up from known or suspected
risk factors of the condition, indentified through a litterature
Suggested steps are as follows:
1) Preliminary screening:
A factor is a confounding factor if it is:
- linked to the disease in non exposed subjects AND
- linked to the exposition factor in disease free patients.
Therefore, selecting relevant confounding factors from the data
- perform an association test covariate * disease in non exposed
subjects --> p-value 1
- perform an association test covariate * exposition in disease free
subjects --> p-value 2
- Keep the covariate for the multivariate model if both p-values are <
2) Fit the multivariate model.
I'll give no detail for this part since I have no question about it.
3) Validate it.
Now, my questions are:
- Why should I limit the association test between covariate and the
disease to non exposed patients ?
- Same question for the test of the association between the covariate
and the exposition factor ?
I've never seen that before.
Thank you for your opinion.