Date: Mon, 17 Jan 2011 19:13:22 0500
ReplyTo: Rushi Patel <rushi.b.patel@GMAIL.COM>
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: Rushi Patel <rushi.b.patel@GMAIL.COM>
Subject: Re: Forecasting problem with Logistic Regression Model
InReplyTo: <232887.86838.qm@web30205.mail.mud.yahoo.com>
ContentType: text/plain; charset=ISO88591
Tanmoy,
are u interested in monthly statusc,dq or pp and monthly ending
balancein dollar amounts? Your post does not seem to clarify this.
Regarding the probabilities that you are estimating, do u have time
dummies on the rhs for each of the 36 months? Using time dummies would
allow you to predict the monthly probabilities from one state to
another for a given borrower. You might have already done it this way
but it was a little unclear to me reading your post.
Rushi
On Monday, January 17, 2011, Tanmoy Mukherjee <tkmcornell@yahoo.com> wrote:
> Dear All,
>
> I will appreciate if someone can please provide their expert thoughts on the problem that I am currently facing with regards to Forecasting using a Logistic Regression Model.
>
> DESCRIPTION OF THE PROBLEM:
> I need to develop a predictive model that forecasts out the monthly ending balance and ending status of Mortgage loans in a pool over a period of next 36 months. The Mortgage loans at the beginning of the predictive period (36 months) are either in Current(C) status or Delinquent(DQ) status which is a known variable.
>
> Mortgage loans can have a beginning status of either Current (C) or Delinquent(DQ) while ending balance over a month can either be Current(C), Delinquent(DQ) or Prepaid (PP) status. If the loan takes the Prepaid (PP) status then it exits the pool.
>
> APPROACH TAKEN:
> I have data on Mortgage loans performance i.e. ending status given their beginning status over the 36 months and I divided it into three random groups :
> a) Build sample (BS) : Mortgage loans with beginning balance and status (C or DQ) and ending balance and ending status (C, DQ or PP) over a period of 36 months
> b) Validation sample (VS1) : Mortgage loans with beginning balance and status (C or DQ) and ending balance and ending status (C, DQ or PP) over a period of 36 months
>
> c) Validation sample (VS2) : Mortgage loans whose beginning balance and status (C or DQ) is known at the beginning of the observation period ( i.e. for the first month only)
>
> Sample sizes are not an issue because sample sizes are adequate for both events and nonevents in each of the three groups.
>
> MODEL BUILD
>
> I built a Binomial Logistic regression model that predicts the ending status of a loan given:
> a) its beginning status
> b) At origination loan level variables
> c) Dynamic variables that are updated at the beginning of every month
>
> I used the Build sample to build the model. I divided the Build sample into four groups that were defined as :
> a) Group I : beginning status = C and ending status = DQ ( Gives me the odd ratio of C2DQ transition and the value of Pr(DQ)/Pr(C) )
> b) Group II : beginning status = C and ending status = PP ( Gives me the odd ratio of C2PP transition and the value of Pr(PP)/Pr(C) )
> c) Group III : beginning status = DQ and ending status = C ( Gives me the odds ratio of DQ2C transition and the value of Pr(DQ)/Pr(C) )
> d) Group IV : beginning status = DQ and ending status = PP ( Gives me the odds ratio of DQ2PP transition and the value of Pr(DQ)/Pr(PP) )
>
> Combined the models to get the following :
> a) If beginning status was "C" then predict the one month transition probability to move to C, DQ or PP i.e. P(C2DQ), P(C2PP) and P(C2C)
> b) if beginning status was "DQ" then predict the one month transition probability to move to C,DQ or PP i.e. P(DQ2C), P(DQ2PP) and P(DQ2DQ)
>
> The model fits were reasonable and I can provide the results of the regression if someone needs it.
>
> CHECKING THE OVERALL FIT OF THE MODEL
>
> Started with the Validation Sample (VS1) and used the model to :
>
> a) Given the beginning status of C used the model to predict the probability of transitioning to C, DQ or PP
> b) Given the beginning status of DQ used the model to predict the probability of transitioning to C, DQ or PP
>
> Used the predicted probabilities to compute the ending C, DQ and PP Balance. Added the C, DQ and PP ending balance across mortgage loans and then computed the :
> a) Ending C balance as percentage of beginning balance
> b) Ending DQ balance as percentage of beginning balance
> c) Ending PP balance as percentage of beginning balance
>
> Compared the predicted ending C, DQ an PP balance with the actual balance and found very reasonable fits that validates the model works well for predicting the transition probabilities. However, the problem is that in this case the beginning status of the loan at the beginning of each of the 36 months is well known and given the beginning status the model predicts the probability of the ending status very well.
>
> PROBLEM I AM FACING:
> The reallife data is never like the sample VS1 and in fact is in the form of Validation Sample 2 (VS2) i.e. where the beginning status of the loans is known for only one month and the model needs to predict the ending balance of loans in the C, DQ and PP bucket at the end of every month for the next 36 months.
>
> Questions I have are as follows:
>
> 1. If I need to use the model mentioned above to solve the problem at hand for VS2 data, how can I go about using the same?
> 2. I adopted the following approach but was getting very poor fits between actual and predicted. I will appreciate if someone can point out the error I am making or suggest a better method:
> a) I am starting with each loan in the first month and then calculating the ending probability for the loan and then compute the ending balance in the three buckets C, DQ and PP
> b) then for each month I am taking the loans in each of the C and DQ bucket and then computing the transition probability and computing the ending balance in C, DQ and PP
> c) Repeat the step for each loan in the sample and then sum them across each of the 36 months
>
> I will really appreciate if someone can help me with this problem.
>
> Thanking you in advance.
>
> Thanks and Regards,
> Tanmoy
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Tanmoy Kumar Mukherjee
> 3 Perrine Court,
> East Brunswick, NJ 08816
> Phone: 9173994540
> Email: tkmcornell@yahoo.com
>
