Date: Wed, 19 Jan 2011 09:51:47 -0500
Reply-To: Rushi Patel <rushi.b.patel@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Rushi Patel <rushi.b.patel@GMAIL.COM>
Subject: Re: Forecasting problem with Logistic Regression Model
Content-Type: text/plain; charset=ISO-8859-1
Based on my exp, your problem warrants a look at multinomial. At our
mortgage shop, we (other modeling team) model this problem- where the
dependent variable, end state, can be c,dlnq, pp or default - through
a multinomial model. Model Works well.
We do use binary logit to model transitions but only when the end
state is binary - default or not. Multiple beg states are handled
either by seperate models or as u do it, by introducing a beg state
variable on the rhs.
On Tuesday, January 18, 2011, Tanmoy Mukherjee <firstname.lastname@example.org> wrote:
> Thanks for your reply.
> 1. I am interested in the ending monthly status of a loan i.e. C, DQ or PP; as the ending Balance in dollar amounts is a calculated field given that ending balance = beginning balance*(1 + Int_Rate/1200) - Scheduled Payment - Prepayments. Using this one can easily calculate the ending balance once the beginning balance, Int Rate and the Scheduled Monthly Payment are known. Prepayments is the % of the balance that ends up in the ending status as PP
> 2. I do not have dummies for each of the 36 month but I have a loan_age variable in the RHS that is one of the dynamic variable and also a baseline hazard function that gives me the C2DQ%, C2PP%, DQ2C% and DQ2PP% by loan_age. I think it serves the purpose of having the month dummies that you are talking about. I did not put
> the month dummy because if I did then it will mean having 36 variables in the rhs and that increases if I were to forecast for say 120 months.
> Thanks and Regards,
> Tanmoy Kumar Mukherjee3 Perrine Court,East Brunswick, NJ 08816Phone: 9173994540Email: email@example.com
> From: Rushi Patel <rushi.b.patel@GMAIL.COM>
> To: SAS-L@LISTSERV.UGA.EDU
> Sent: Monday, January 17, 2011 7:13 PM
> Subject: Re: Forecasting problem with Logistic Regression
> are u interested in monthly status-c,dq or pp- and monthly ending
> balance-in dollar amounts? Your post does not seem to clarify this.
> Regarding the probabilities that you are estimating, do u have time
> dummies on the rhs for each of the 36 months? Using time dummies would
> allow you to predict the monthly probabilities from one state to
> another for a given borrower. You might have already done it this way
> but it was a little unclear to me reading your post.
> On Monday, January 17, 2011, Tanmoy Mukherjee <firstname.lastname@example.org> wrote:
>> Dear All,
>> I will appreciate if someone can please provide their expert thoughts on the problem that I am currently facing with regards to Forecasting using a Logistic Regression Model.
>> DESCRIPTION OF THE PROBLEM:
>> I need to develop
> a predictive model that forecasts out the monthly ending balance and ending status of Mortgage loans in a pool over a period of next 36 months. The Mortgage loans at the beginning of the predictive period (36 months) are either in Current(C) status or Delinquent(DQ) status which is a known variable.
>> Mortgage loans can have a beginning status of either Current (C) or Delinquent(DQ) while ending balance over a month can either be Current(C), Delinquent(DQ) or Prepaid (PP) status. If the loan takes the Prepaid (PP) status then it exits the pool.
>> APPROACH TAKEN:
>> I have data on Mortgage loans performance i.e. ending status given their beginning status over the 36 months and I divided it into three random groups :
>> a) Build sample (BS) : Mortgage loans with beginning balance and status (C or DQ) and ending balance and ending status (C, DQ or PP) over a period of 36 months
>> b) Validation sample (VS1)
> : Mortgage loans with beginning balance and status (C or DQ) and ending balance and ending status (C, DQ or PP) over a period of 36 months
>> c) Validation sample (VS2) : Mortgage loans whose beginning balance and status (C or DQ) is known at the beginning of the observation period ( i.e. for the first month only)
>> Sample sizes are not an issue because sample sizes are adequate for both events and non-events in each of the three groups.
>> MODEL BUILD
>> I built a Binomial Logistic regression model that predicts the ending status of a loan given:
>> a) its beginning status
>> b) At origination loan level variables
>> c) Dynamic variables that are updated at the beginning of every month
>> I used the Build sample to build the model. I divided the Build sample into four groups that were defined as :
>> a) Group I : beginning status = C and ending status = DQ ( Gives me
> the odd ratio of C2DQ transition and the value of Pr(DQ)/Pr(C) )
>> b) Group II : beginning status = C and ending status = PP ( Gives me the odd ratio of C2PP transition and the value of Pr(PP)/Pr(C) )
>> c) Group III : beginning status = DQ and ending status = C ( Gives me the odds ratio of DQ2C transition and the value of Pr(DQ)/Pr(C) )
>> d) Group IV : beginning status = DQ and ending status = PP ( Gives me the odds ratio of DQ2PP transition and the value of Pr(DQ)/Pr(PP) )
>> Combined the models to get the following :
>> a) If beginning status was "C" then predict the one month transition probability to move to C, DQ or PP i.e. P(C2DQ), P(C2PP) and P(C2C)
>> b) if beginning status was "DQ" then predict the one month transition probability to move to C,DQ or PP i.e. P(DQ2C), P(DQ2PP) and P(DQ2DQ)
>> The model fits were reasonable and I can provide the results of the regression if someone needs
>> CHECKING THE OVERALL FIT OF THE MODEL
>> Started with the Validation Sample (VS1) and used the model to :
>> a) Given the beginning status of C used the model to predict the probability of transitioning to C, DQ or PP
>> b) Given the beginning status of DQ used the model to predict the probability of transitioning to C, DQ or PP
>> Used the predicted probabilities to compute the ending C, DQ and PP Balance. Added the C, DQ and PP ending balance across mortgage loans and then computed the :
>> a) Ending C balance as percentage of beginning balance
>> b) Ending DQ balance as percentage of beginning balance
>> c) Ending PP balance as percentage of beginning balance
>> Compared the predicted ending C, DQ an PP balance with the actual balance and found very reasonable fits that validates the model works well for predicting the transition probabilities. However, the
> problem is that in this case the beginning status of the loan at the beginning of each of the 36 months is well known and given the beginning status the model predicts the probability of the ending status very well.
>> PROBLEM I AM FACING:<