Date: Wed, 3 Jan 2007 12:23:11 -0500
Reply-To: Peter Flom <Flom@NDRI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <Flom@NDRI.ORG>
Subject: Re: Help for Dummies Creation
In-Reply-To: <OF82002D9A.A90B911F-ON03257258.0062A082-03257258.0063EBB0@serasa.com.br>
Content-Type: text/plain; charset=US-ASCII
>>> Ricardo G Silva <ricardosilva@SERASA.COM.BR> 1/3/2007 1:11 pm >>> wrote
<<<
I will use Logistic Regression (PROC LOGISTIC) with 3 categories:
0 ? Good
1 ? Intermediate
2 ? Bad
Suppose I have a continuous independent variable (income). The values which it can assume are:Missing or values among 0 to $1000. How can I categorize this continuous variable? Note, that I should consider that:
1) If I consider two categories 0 and 2 (Good and Bad) the best classes
(grouping by odds ratio) are:
Missing,
$0 - $200
$201 - $500
$501 - $1000
2) But if I consider the categories 0 and 1 (Good and Intermediate) the
best classes (grouping by the odds ratio) are:
Missing,
$0 - $300
$301 - $700
$701 - $1000
How can I choose the best class for income if they have different odds for each pair of variables? Are there any kind of SAS procedure which perform this "categorization"?
>>>
I will argue that you should not categorize at all, but treat income as continuous.
For missing data, what to do depends on how much missing data there is and why it is missing.and what other variables you have. PROC MI and PROC MIANALYZE may be useful, but may not.
I would also look at graphs to figure out what is going on
HTH
Peter
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
http://cduhr.ndri.org
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)