LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 10 Oct 2005 11:03:56 -0700
Reply-To:     Daniel Nordlund <res90sx5@VERIZON.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Daniel Nordlund <res90sx5@VERIZON.NET>
Subject:      Re: Dummy variables & proc logistic - what am I missing?
Comments: To: Adam <dprkexchange@HOTMAIL.COM>
In-Reply-To:  <1128957507.276594.303800@z14g2000cwz.googlegroups.com>
Content-type: text/plain; charset=utf-8

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Adam > Sent: Monday, October 10, 2005 8:18 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Dummy variables & proc logistic - what am I missing? > > I'm using survey data (from the California Health Interview Survey) to > estimate as best I can the independent effect of income on stroke. The > variables are coded in the interview data so that increasing values > correspond with better health indicators. > > For instance, in the data, the coding for current smoking status is > such: > > smkcur = 1 is a current smoker; smkcur = 2 is not a current smoker. > Also, when the question was skipped, then smkcur = -2. > > When I created dummy variables in ascending order, so that non-smokers > were coded as 0, and smokers and "data not availables" were coded as 1 > (there were 171 out of more than 8,600 records with NA data). > > I likewise dummy coded the variable for income so that below $25,000 a > year was lowincome = 1 and greater than or equal to $25,000 a year was > lowincome = 0. > > When I did so, the calculated odds ratios I got for what should have > been risk factors came back as protective factors (with odds ratios > below 1), and the data that I didn't recode yet (and thus was in the > reverse order of the recoded data; i.e. risk behaviors were coded such > that riskybehavior = 1, lessriskybehavior = 2 and so on, monotonically) > came back as risk factors (with odds ratios that were greater than 1 > and even 2, for diabetes, which is a known risk factor for stroke). > When I recoded diabetes, it came back as a protective factor as well. > > It seems like I'm missing the concept more than the technicalities > here, but I'm including the code below and the results below. Thanks > in advance for the help - it's always much appreciated! > > (FYI, ab29 = 1 when interviewee had a doctor who told him/her about > high blood pressure, ab29 = 2 when doctor had never mentioned HBP; ab34 > = 1 when prior heart disease and ab34 = 2 when no prior heart disease; > ae13 = # of drinks per day continuous.) > > Adam > > data test; > set chis.pufa1; > /**ac6 is stroke variable, only applicable to interviewees 65+ years of > age**/ > /**where ac6 = -1, interviewee was less than 65 years**/ > /**where statement to exclude interviewees with no data available**/ > where ac6 ge 0; > > /**Dummy Variables**/ > /**Income Groups**/ > if ak22_p =< 25000 then lowincome = 1; > if ak22_p > 25000 then lowincome = 0; > > /**Stroke Variable**/ > /**Data Dictionary: ac6 = stroke; where ac6=1, stroke; where ac6=2, no > stroke**/ > if ac6 = 1 then stroke = 1; > if ac6 = 2 then stroke = 0; > > /**Ever have diabetes?**/ > /**else statement for "diabetes = 3" where 3 is borderline or > pre-diabetes, n = 416**/ > if ab22 = 1 then diabetes = 1; > if ab22 = 2 then diabetes = 0; > else diabetes = 1; > > > /**Current Smoking Variable**/ > /**Where smkcur question was skipped, value set -2.**/ > /**Assume current smoking = yes (conservative estimate)**/ > /**smkcur = 1 is current smoker; smkcur = 2 is non-smoker; smkcur = -2 > is data N/A; n=171**/ > if smkcur = -2 then smokes = 1; > if smkcur = 1 then smokes = 1; > if smkcur = 2 then smokes = 0; > > proc logistic data = test order = data; > weight rakedw0; > model ac6 = lowincome srage_p srsex ab29 diabetes ab34 ae13 smokes; > run; > --------------------------------------------------- > The LOGISTIC Procedure > > Analysis of Maximum Likelihood Estimates > > Standard Wald > Parameter DF Estimate Error Chi-Square > Pr > ChiSq > > Intercept 1 4.0015 0.0256 24358.2853 > <.0001 > lowincome 1 -0.1333 0.00389 1174.3857 > <.0001 > SRAGE_P 1 -0.0471 0.000304 24077.9639 > <.0001 > SRSEX 1 -0.0604 0.00396 232.3938 > <.0001 > AB29 1 0.7180 0.00434 27401.5473 > <.0001 > diabetes 1 -0.3277 0.00439 5562.5470 > <.0001 > AB34 1 0.7701 0.00388 39346.4711 > <.0001 > AE13 1 0.0856 0.00141 3675.4610 > <.0001 > smokes 1 -0.8101 0.00488 27585.4492 > <.0001 > > > Odds Ratio Estimates > > Point 95% Wald > Effect Estimate Confidence Limits > > lowincome 0.875 0.869 > 0.882 > SRAGE_P 0.954 0.953 > 0.955 > SRSEX 0.941 0.934 > 0.949 > AB29 2.050 2.033 > 2.068 > diabetes 0.721 0.714 > 0.727 > AB34 2.160 2.144 > 2.176 > AE13 1.089 1.086 > 1.092 > smokes 0.445 0.441 > 0.449 > > > Association of Predicted Probabilities and Observed > Responses > > Percent Concordant 68.9 Somers' D > 0.389 > Percent Discordant 30.0 Gamma > 0.393 > Percent Tied 1.1 Tau-a > 0.061 > Pairs 5931331 c > 0.695

Adam,

I see you have gotten a couple responses. Let me add 2 things.

1. PROC logistic defaults to predicting the lowest value of the dependent variable. Your stroke variable is coded 1=stroke, 0=no stroke, so your program is estimating odds of no stroke. Change your proc logistic line to

proc logistic data = test order = data DESCENDING;

in order to get what you expect.

2. You probably shouldn't be using Proc Logistic here. This is survey data. You should be using proc surveylogistic and friends for this analysis (David must be taking a coffee break or doing something silly, like productive work :-).

Hope this helps,

Dan Nordlund Bothell, WA


Back to: Top of message | Previous page | Main SAS-L page