LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 25 May 2006 20:14:06 -0700
Reply-To:     sophe88@YAHOO.COM
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         sophe88@YAHOO.COM
Organization: http://groups.google.com
Subject:      Re: A proc logistic question
Comments: To: sas-l@uga.edu
In-Reply-To:  <4471CD1F020000C900006B07@mail.NDRI.ORG>
Content-Type: text/plain; charset="iso-8859-1"

Thank you for your reply.

Here is background.

1. The DV is binary. 2. About 120 IV, some continuous some categorical. 3. 31000 records in learning universe. About the same count in validation holdout. 4. Dv% is about 3.1=1.

5. All the 120 IV have at least 0.2 poly corr with DV, - or +. 6. While I am not saying our steps are perfect or best, all the 120 IV have been reasonably doubted not to have severe multi-corr problems or illness. 7. No survey data. 8. No treatment vs. control type of increment measure effects at how. Plain, classic binary logistic.

Now story about this variable question:

Some of my statisticians have stendency to derive using classifier to recode variables. Sometimes the tool is Baysian naive macros, sometimes trees. Sometimes Clementine, sometimes Kxen. This specific one was originally just the first 2 bytes of a 4 byte SIC codes (SIC=standard industry code). In some previous projects, the so-called SIC clustering worked out well, in terms of boosting top decile lifts. But I have been advocating reasonably high top lift.

This time one tested this recoding using CHAID in SPSS clementine and came up with 7 new values/cuts on the sic data. She plugged it in the logistic model, since somehow this new baby survived her usual mul-corr tests. This new sic variable squeezed out 3 other variables, smoothed out the lift (boosting it to actually 638, from 497) and the lift was more consistent on both L and V. The only problem was the '999' scar I mentioned in the ODDs ratio. I did not feel very well about it, but could not call up any reference right away.

After testing several other variables, I see this tends to happen to variables that carry ratios or percentage values, while other variables in the IV pool are original 'numbers'. Perhaps eventually and essentially the scale is the problem?

Thanks.

PD

Peter Flom wrote: > Peter L. Flom, PhD > Assistant Director, Statistics and Data Analysis Core > Center for Drug Use and HIV Research > National Development and Research Institutes > 71 W. 23rd St > http://cduhr.ndri.org > www.peterflom.com > New York, NY 10010 > (212) 845-4485 (voice) > (917) 438-0894 (fax) > > > >>> <sophe88@YAHOO.COM> 05/22/06 2:20 PM >>> wrote > <<< > I see this in my proc logistic output > > Odds Ratio Estimates > ................................ > > Point 95% Wald > Effect Estimate Confidence Limits > > var1 1.029 1.002 1.057 > sr_cust >999.999 >999.999 >999.999 > > > What does >999.999 mean? Does it mean sr_cust is a 'bad' var and should > not stay in the model? Removing it will 'crash' the model lift > table(bumpy), > although I may find others to replace it. Its Chisq and others look OK > to me. Thanks. > >>> > > You haven't given us much to go on. Could you give some context? > What's your DV, > what are your IVs, what is N? Was it a survey? (Paging Dr. Casselll) > > One thing might be that the scale is wrong. Something like this could > happen if, say > the IV was personaly income measured in millions of dollars per year, > and the outcome > was probability of owning a home......Try changing the unit. > > But please also write back to SAS-L with more information > > Peter


Back to: Top of message | Previous page | Main SAS-L page