```Date: Tue, 9 Nov 2010 12:02:28 -0500 Reply-To: peterflomconsulting@mindspring.com Sender: "SAS(r) Discussion" From: Peter Flom Subject: Re: Multinomial Non-Ordered Generalized Logit Model for Mortgage Transition Comments: To: Tanmoy Mukherjee In-Reply-To: <740067.58595.qm@web30206.mail.mud.yahoo.com> Content-Type: text/plain; charset="us-ascii" Tanmoy Mukherjee wrote <<<< PROBLEM DESCRIPTION: I am trying to model the transition state of a Mortgage loan transitioning FROM "Current" state in this month TO the following possible states in the next month: a) 30days Delinquent, (Event : Current to 30days Delinquent = 2) b) Current, (Event : Current to Current = 1) c) and Paid off, (Event : Current to Paid off = 3) These are the only three transition states that the Mortgage loan can go to. Therefore the response variables are in my opinion a Nominal Unordered choice for a Mortgage borrower. The predictor variables are basically divided into two categories : a) Borrower characteristics b) Loan characteristics (loan that the borrower has taken) c) Property level characteristics (Property that the borrower has taken the Mortgage against) Because of these characteristics I am making the following assumptions : a) The correct model is the Multinomial Generalized Logit Model b) I am using PROC LOGISTIC with the link=glogit option instead of PROC CATMOD because of the presence of continuous predictor variables. (I have tried using PROC CATMOD with the DIRECT statement for the continuous predictor varaibales but it takes almost an infinite time to converge as against PROC LOGISTIC) Hope this clarifies the problem statement. Here are my questions and I will appreciate if you can help me answer the same : QUESTIONS: 1. Is the approach that I mentioned the right approach? Are there other alternate approaches that people can suggest about? 2. I am having an issue of very small outcomes for events Current to 30days Delinquent (2%) and Current to Paid off (1%) when compared to the event of Current to Current (97%). Questions on this are : a) Should I be re-sampling the data so that I pull a random sample for the Current to Current so that all the three events are equally distributed in the data b) What alternate approaches can I take on this. 3. What are the statistics available for testing the Predictive ability of the Model other than AIC, BC, -2logL and RSQ? 4. One of the good measures to compute the Predictive ability of a binary Logit model is the c-statistic (Area under the ROC curve). However, in this case it cannot be computed because of the non-ordered multiple level of responses. I will appreciate if someone can suggest some alternate methods for testing the predictive ability of the model >>> First, thanks for context. 1. This seems a reasonable approach. There are certainly others, but this seems sensible 2. The key issue is, AFAIK, not so much the proportion that are in each group, as the total. If N is large enough in the small categories, you should be OK 3/4. Model fit can be gauged by looking at predicted values vs. actual values. Here, this would be a 3x3 table. Ideally, you would separate the data into a training and test set before running the model and report predictive ability on the test set, with the model from the training set. HTH Peter ```

Back to: Top of message | Previous page | Main SAS-L page