Date: Mon, 17 Oct 2005 20:14:44 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: PROC SURVEYSELECT with weights--How to do
Content-Type: text/plain; format=flowed
>This question pertains to bank-related direct marketing campaigns. We sent
>out a campaign and we wait for the results: RESPONDERS and NONRESPONDERS.
>I would like to learn how to do the followingóhere is an example so I can
>understand. The numbers are made-up but enough realistic.
>We sent out a campaign randomly to 537,881 prospects. [Side Note: We also
>put aside a control group of 31,982 prospects and they will not be
>campaigned to, we will record those who respond on their own.]
>After a few months the results come back and out of the 537,881 prospects
>only 1,199 respond. Hence a response rate of about 0.23%. The control group
>produces 68 responders, which is a 0.24% response.
>We see that the campaign materials (creative, offer/product extended, etc.)
>didnít do much of anything compared to the control.
>I want to use the 537,881 records above along with the 1,199 RESPONDERS to
>built a predictive model to be used in the next campaign of the same
>product, creative, etc.
>Here is the idea per (-- TMK -- "The Macro Klutz" ):
>He suggests that instead of building the model with such a low response
>rate of 0.23%, why donít you use PROC SURVESELECT and tell this procedure
>to turn the 0.23% into, say, 20%. This procedure will then output the
>appropriate weights to be used in the modeling process.
>Can someone please show me how you tell PROC SURVELYSELECT to do this?
>Also, using PROC LOGISTIC, how are the weights from PROC SURVEYSELECT used
>in PROC LOGISTIC? Are they used as a class variable? I do not know what to
>do with the weights. I do know that the weights must be used correctly so
>that the true RESPONDER rate is not 20% but the real one which is 0.23%.
>Thatís why I need to be careful with the weights to make sure the model
>coefficients donít reflect a 20% response but rather a 0.23% response.
Since you already have the split between campaigned and non-campaigned,
you could use that as a stratum. Just manufacture a multiplier (which we
use in the SIZE statement) so that we inflate 0.24% up to 20% for both
Now 68 / 31982 is 0.21 % . You'd like the 68 to be multiplied so that:
----------------------- = .2
68*m + (31982-68)*1
So now it's just algebra.
m = about 586. The other records will have a multiplier of 1. And we can
this for both strata.
Unfortunately, this means that the sampling weights will also differ by a
586, and the variance estimator will have more noise than if the weights
constant, or all very close.
So you *could* select the records now, with a data step view to add in the
data temp_nick / view = temp_nick;
multiplier = 1 + 585*(responded='Y');
/* or however you have the responders marked in your data */
Now you can use PROC SURVEYSELECT if you want, with the SIZE statement
However, you now should be using PROC SURVEYLOGISTIC instead of PROC
LOGISTIC, as you have a stratified sample with unequal sampling
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
Donít just search. Find. Check out the new MSN Search!