Date: Tue, 9 Nov 2010 12:02:28 -0500
Reply-To: peterflomconsulting@mindspring.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject: Re: Multinomial Non-Ordered Generalized Logit Model for Mortgage
Transition
In-Reply-To: <740067.58595.qm@web30206.mail.mud.yahoo.com>
Content-Type: text/plain; charset="us-ascii"
Tanmoy Mukherjee wrote
<<<<
PROBLEM DESCRIPTION:
I am trying to model the transition state of a Mortgage loan transitioning
FROM "Current" state in this month TO the following possible states in the
next month:
a) 30days Delinquent, (Event : Current to 30days Delinquent = 2)
b) Current, (Event : Current to Current = 1)
c) and Paid off, (Event : Current to Paid off = 3)
These are the only three transition states that the Mortgage loan can go to.
Therefore the response variables are in my opinion a Nominal Unordered
choice for a Mortgage borrower. The predictor variables are basically
divided into two categories :
a) Borrower characteristics
b) Loan characteristics (loan that the borrower has taken)
c) Property level characteristics (Property that the borrower has taken the
Mortgage against)
Because of these characteristics I am making the following assumptions :
a) The correct model is the Multinomial Generalized Logit Model
b) I am using PROC LOGISTIC with the link=glogit option instead of PROC
CATMOD because of the presence of continuous predictor variables. (I have
tried using PROC CATMOD with the DIRECT statement for the continuous
predictor varaibales but it takes almost an infinite time to converge as
against PROC LOGISTIC)
Hope this clarifies the problem statement.
Here are my questions and I will appreciate if you can help me answer the
same :
QUESTIONS:
1. Is the approach that I mentioned the right approach? Are there other
alternate approaches that people can suggest about?
2. I am having an issue of very small outcomes for events Current to 30days
Delinquent (2%) and Current to Paid off (1%) when compared to the event of
Current to Current (97%). Questions on this are :
a) Should I be re-sampling the data so that I pull a random sample for
the Current to Current so that all the three events are equally distributed
in the data
b) What alternate approaches can I take on this.
3. What are the statistics available for testing the Predictive ability of
the Model other than AIC, BC, -2logL and RSQ?
4. One of the good measures to compute the Predictive ability of a binary
Logit model is the c-statistic (Area under the ROC curve). However, in this
case it cannot be computed because of the non-ordered multiple level of
responses. I will appreciate if someone can suggest some alternate methods
for testing the predictive ability of the model
>>>
First, thanks for context.
1. This seems a reasonable approach. There are certainly others, but this
seems sensible
2. The key issue is, AFAIK, not so much the proportion that are in each
group, as the total. If N is large enough in the small categories, you
should be OK
3/4. Model fit can be gauged by looking at predicted values vs. actual
values. Here, this would be a 3x3 table. Ideally, you would separate the
data into a training and test set before running the model and report
predictive ability on the test set, with the model from the training set.
HTH
Peter