Date: Mon, 17 Oct 2005 20:14:44 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: PROC SURVEYSELECT with weights--How to do
In-Reply-To: <200510172015.j9HKBesw032239@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed
ni14@MAIL.COM wrote:
>This question pertains to bank-related direct marketing campaigns. We sent
>out a campaign and we wait for the results: RESPONDERS and NONRESPONDERS.
>
>I would like to learn how to do the following—here is an example so I can
>understand. The numbers are made-up but enough realistic.
>
>We sent out a campaign randomly to 537,881 prospects. [Side Note: We also
>put aside a control group of 31,982 prospects and they will not be
>campaigned to, we will record those who respond on their own.]
>
>After a few months the results come back and out of the 537,881 prospects
>only 1,199 respond. Hence a response rate of about 0.23%. The control group
>produces 68 responders, which is a 0.24% response.
>
>We see that the campaign materials (creative, offer/product extended, etc.)
>didn’t do much of anything compared to the control.
>
>I want to use the 537,881 records above along with the 1,199 RESPONDERS to
>built a predictive model to be used in the next campaign of the same
>product, creative, etc.
>
>Here is the idea per (-- TMK -- "The Macro Klutz" ):
>
>He suggests that instead of building the model with such a low response
>rate of 0.23%, why don’t you use PROC SURVESELECT and tell this procedure
>to turn the 0.23% into, say, 20%. This procedure will then output the
>appropriate weights to be used in the modeling process.
>
>Can someone please show me how you tell PROC SURVELYSELECT to do this?
>Also, using PROC LOGISTIC, how are the weights from PROC SURVEYSELECT used
>in PROC LOGISTIC? Are they used as a class variable? I do not know what to
>do with the weights. I do know that the weights must be used correctly so
>that the true RESPONDER rate is not 20% but the real one which is 0.23%.
>That’s why I need to be careful with the weights to make sure the model
>coefficients don’t reflect a 20% response but rather a 0.23% response.
Since you already have the split between campaigned and non-campaigned,
you could use that as a stratum. Just manufacture a multiplier (which we
will
use in the SIZE statement) so that we inflate 0.24% up to 20% for both
strata.
Now 68 / 31982 is 0.21 % . You'd like the 68 to be multiplied so that:
68*m
----------------------- = .2
68*m + (31982-68)*1
So now it's just algebra.
m = about 586. The other records will have a multiplier of 1. And we can
use
this for both strata.
Unfortunately, this means that the sampling weights will also differ by a
factor of
586, and the variance estimator will have more noise than if the weights
were
constant, or all very close.
So you *could* select the records now, with a data step view to add in the
multipier variable:
data temp_nick / view = temp_nick;
set YourData;
multiplier = 1 + 585*(responded='Y');
/* or however you have the responders marked in your data */
run;
Now you can use PROC SURVEYSELECT if you want, with the SIZE statement
like this:
SIZE multiplier;
However, you now should be using PROC SURVEYLOGISTIC instead of PROC
LOGISTIC, as you have a stratified sample with unequal sampling
probabilities.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/