**Date:** Fri, 16 May 2008 10:35:14 -0700
**Reply-To:** Dale McLerran <stringplayer_2@YAHOO.COM>
**Sender:** "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
**From:** Dale McLerran <stringplayer_2@YAHOO.COM>
**Subject:** Re: PROC GLIMMIX--How to Check Linearity Assumption
**In-Reply-To:** <81074a80-9057-4b89-b627-6239b8762e3a@i76g2000hsf.googlegroups.com>
**Content-Type:** text/plain; charset=iso-8859-1
--- Shiling Zhang <shiling99@YAHOO.COM> wrote:

> > How do I check that the 50K obs of the above predictors are
> linearly related to the FRAUD variable?
>
> It is neither necessary nor sufficient. In fact the model assumes
> that
> {logodds of FRAUD} NOT FRAUD is linearly related to your linearly
> predictors.
>
> Here is a way to views it.
> 1) Bin a predictor into, say 30 bins. i=1 to 30
> 2) Calculate logodds of FRAUD for each bin.
> 3) plot logodds of FRAUD against bined predictor values(mean,
> median)
>
> Based on what you see, you may take a proper transformation. One is
> parametric and the other is non-parametric. If you have a large
> number
> of events( FRAUD), the non-parametric way is prefered. ......
>
> HTH
>

Yes, this is most certainly one way to examine the linearity
assumption. This approach works best if you have only a single
predictor variable. In the multivariate setting where the effect
of each predictor is conditional on the effects of other predictors,
then this approach may not work as well. Also, I don't see the need
for any more than 10 bins in most circumstances.

Another way to examine whether linearity holds is to include terms
in your model which represent some departure from linearity. If
there is significant improvement in the model fit when these terms
are included, then the assumption of linearity in the predictors
does not hold. Often, this is performed simply by including
polynomials of your predictors. A little more sophisticated approach
may be to employ a spline basis for representing nonlinearity. My
favorite spline basis is to use restricted cubic splines as discussed
in

Harrell, Frank. "Regression Modeling Strategies: With Applications
to Linear Models, Logistic Regression, and Survival Analysis."
Springer, 2001.

There are a couple of SAS macros available for generating splined
variables. Go to
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/SasMacros
and follow the links from there.

HTH,

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------