| Date: | Mon, 5 Sep 2005 21:34:11 -0400 |
| Reply-To: | Talbot Michael Katz <topkatz@MSN.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Talbot Michael Katz <topkatz@MSN.COM> |
| Subject: | Binary Response Models with Repeated Measures Data |
|---|
Hi, gang.
This is the posting I foretold of :-)
I haven't done a lot of repeated measures work, so I'm looking for some
advice (while I'm waiting for Will Potts' book on Survival Data Mining to
appear). Here's the set-up. We've got direct marketing campaigns to try
to enroll people into a bonus program; obviously, the hope is that this
will increase spending and loyalty, because enrollment in the program
without increased spending and loyalty is a money loser. Each campaign
sends out offers to quite a lot of people. We have campaign data going
back for a few years. Looking back through all the campaigns of this
particular type since 2003, we have about 4.5 million pieces going out to
2.5 million individuals. So, as you can see, quite a number of people who
didn't enroll the first time they got an offer, were included in one or
more subsequent campaigns.
Now, I am quite accustomed to one-shot offers in direct marketing.
Typically the results of one campaign are used to build predictive models
for the next campaign. First the non-responders and responders are
classified as a 0,1 dependent variable for a first-stage response /
propensity model; then the amount of the spending response is modeled for
a second-stage spending model (hey, weren't we just talking about two-
stage models in a separate thread?).
We certainly have enough responders to build a response / propensity model
from only the most recent campaign, where each offer represents a single
individual. But the reason we want to use some of the older campaigns is
because profitability is a longer-term issue; we want to examine post-
enrollment behavior over several months, or even a year. Consequently, I
was given the following proposal. Start with a universe consisting of
every offer in each campaign (the 4.5 million), and treat them all as
separate independent individuals. In such a case, the data for each
appearance of a single person may be quite different; the number of
previous offers will have changed, and the recent spending patterns may
change. And since, with a 5% response rate, you can build very good
response models on samples of 20,000 or 30,000, you may not get a lot of
multiple observations for the same people in a particular sample (that's a
tough probability to compute).
So, the first question is, will this give a valid response model? I tend
to feel that it will underpredict response. If it's not good to treat
each offer as an independent individual, what is the best way to deal with
it? I could model the response for a single campaign, and score every
enrollee on each campaign by that model for use in the profitability
model. Does that make more sense? Or is there a better way?
I hope I've presented the situation clearly, and I look forward to
receiving your ideas. Thanks!
-- TMK --
"The Macro Klutz"
|