LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (March 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 21 Mar 2006 12:20:24 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: Statistical Question--PROC LOGISTIC
In-Reply-To:  <OF944F7851.4083BF19-ONC2257138.004551A3-C2257138.0045DD7C@hsbc.com.tr>
Content-Type: text/plain; format=flowed

BoraYavuz@hsbc.com.tr wrote: >David, > >Can you expand a bit more on "certainty sampling" which you mentioned. > >We frequently find ourselves in pretty much the same situation Nick >described and deal with it - mostly - using weights manually calculated in >Excel (after deciding on the stratification variables through common >sense). And I believe it would be great if you could provide some more >info (examples, your comments, etc.) and pointers on the subject matter. > >I also didn't understand the "size multipliers" that you mentioned. :-( >I'd be grateful if you could expand on that one too.

When we build a sample that has different weights (within a stratum or with no strata at all) we do that by picking a 'multiplier' so that we pick some records with a higher likelihood than others. The variable that we use for this is the variable we list in the SIZE statement.

If our boss comes to us and says:

"Okay Bora, here's what I need and I need it last week. So step on it! I want a sample of 40,000 from the database. Yeah, yeah, I know you pulled one yesterday, but this time I need it different. I need the people with incomes under $10,000 sampled at only one-tenth the rate we use on the people with incomes over $10,000 . And I need every single one of the people with an income over $100,000 . I expect to see this in my inbox by close of business today!"

Okay, maybe our boss isn't that nice. :-)

But now we have certainty sampling (we have to get all the high-income people) and we have PPS sampling. PPS= Probability Proportional to Size.

Let's do this now. The SIZE variable gets used in the certainty sampling part too, so we need to think about this. We want a multiplier which is 10 times larger for the medium class than the low class:

if income > 10000 then mult = 10; else mult = 1;

Or we could use a Boolean and write it as:

mult = 1 + 9*(income > 10000);

But we also need that certainty sample. The SIZE variable works with the certainty option CERTSIZE like this: we give the largest values of the multiplier to the records to be sampled for certain, and we use the CERTSIZE option to tell the system what that cut-off will be. So let's tack that extra bit on:

if income > 100000 then mult = 20; else if income > 10000 then mult = 10; else mult = 1;

Or we use our little Boolean trick again:

mult = 1 + 9*(income > 10000) + 10*(income > 100000);

Now we can do both the certainty sampling *and* the weighted sampling together:

proc surveyselect data=YourBigData out=YourSample seed=40589584 method=pps certsize=20 sampsize=40000; size mult; run;

Does that make more sense now?

As for what I wrote before about certainty sampling, I'm copying the URL so you can find it in the SAS-L archives.

http://listserv.uga.edu/cgi-bin/wa?A2=ind0603C&L=sas-l&P=R28796

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Is your PC infected? Get a FREE online computer virus scan from McAfeeŽ Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


Back to: Top of message | Previous page | Main SAS-L page