Date: Thu, 13 Jan 2005 15:50:00 -0800
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: Statistical question pertaining to missing values
In-Reply-To: <7d2875ae0501131419bc16448@mail.gmail.com>
Content-Type: text/plain; charset=us-ascii
--- Nomi <sajeelm@GMAIL.COM> wrote:
> First of all Proc logistic will not take any chracter variable as it
> is.
That is not true. The logistic procedure has a CLASS statement,
and character variables can be named on the CLASS statement and
then on the right hand side of the MODEL statement. Character
variables named on the CLASS and MODEL statements are expanded
into a set of dummy variables. This was not true of SAS version
6.xx, but is true for any later version of SAS, which I would
hope the original poster has available.
> Therefore, before in the data prep stage you need to convert the
> variables into numerics. Now I understand your concern 1 - - to keep
> the variable as it is for some business readon. However, you can
> create a duplicate variable off of the same variable by performing an
> arithmic function.
No, it is not necessary to construct an additional variable to
hold an edited copy of the variable where missing values have
been converted to some nonmissing value. The CLASS statement
again can handle the problem of the missing values easily - as
long as we do not REALLY IMPUTE VALUES. That is, if you want
to treat all those who have a missing value for the character
variable as a homogeneous set (just like we assume that all
those who have the value 12 are assumed to be a homogeneous set),
then all that is necessary is to add the option MISSING onto
the end of your CLASS statement. Thus, if we have the
character variable MODEL taking on the values '12', '04', '35',
' 6', '09', and missing value ' ', then the following code
will fit the logistic regression for response Y with six dummy
variable predictors:
proc logistic data=mydata;
class model / ref=glm missing;
model y = model;
run;
I believe that there is some confusion based on the original
poster indicating that they wanted to substitute some numeric
value (mean or median values were suggested) for the missing
value. But the original poster also stated that the variable
should remain a character variable. The original problem was,
I believe, poorly stated.
> For case 2, you can create buckets (dummy variables). based on the
> response distribution.
>
> Thanks,
> Sajeel
>
>
> On Thu, 13 Jan 2005 17:06:33 -0500, Nick . <ni14@mail.com> wrote:
> > Hello,
> > I have a variable that takes on values like 12 04 35 6 09 ... This
> is a character/string variable.
> > It has missing values. Without going into detail at this point
> about the nature and the reason for
> > the missing values, I would like to know what is the best way to
> replace them with something like
> > the mean or the median, etc. The variable will be an input to a
> PROC LOGISTIC model. I wish for this
> > variable to remain a character. I do not wish to convert it to a
> number and then use PROC MEANS etc.
> >
> > I have the same question as above except now the variable looks
> like A1 MM B C2 .... clearly
> > a string variable. How do I replace it so it can become part of the
> logistic model?
> >
> > Your help from our outstanding statisticians is appreciated.
> >
> > NICK
> >
> > --
> > ___________________________________________________________
> > Sign-up for Ads Free at Mail.com
> > http://promo.mail.com/adsfreejump.htm
> >
>
=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
__________________________________
Do you Yahoo!?
All your favorites on one personal page – Try My Yahoo!
http://my.yahoo.com