BoraYavuz@hsbc.com.tr wrote: >David, > >Can you expand a bit more on "certainty sampling" which you mentioned. > >We frequently find ourselves in pretty much the same situation Nick >described and deal with it - mostly - using weights manually calculated in >Excel (after deciding on the stratification variables through common >sense). And I believe it would be great if you could provide some more >info (examples, your comments, etc.) and pointers on the subject matter. > >I also didn't understand the "size multipliers" that you mentioned. :-( >I'd be grateful if you could expand on that one too. When we build a sample that has different weights (within a stratum or with no strata at all) we do that by picking a 'multiplier' so that we pick some records with a higher likelihood than others. The variable that we use for this is the variable we list in the SIZE statement. If our boss comes to us and says: "Okay Bora, here's what I need and I need it last week. So step on it! I want a sample of 40,000 from the database. Yeah, yeah, I know you pulled one yesterday, but this time I need it different. I need the people with incomes under \$10,000 sampled at only one-tenth the rate we use on the people with incomes over \$10,000 . And I need every single one of the people with an income over \$100,000 . I expect to see this in my inbox by close of business today!" Okay, maybe our boss isn't that nice. :-) But now we have certainty sampling (we have to get all the high-income people) and we have PPS sampling. PPS= Probability Proportional to Size. Let's do this now. The SIZE variable gets used in the certainty sampling part too, so we need to think about this. We want a multiplier which is 10 times larger for the medium class than the low class: if income > 10000 then mult = 10; else mult = 1; Or we could use a Boolean and write it as: mult = 1 + 9*(income > 10000); But we also need that certainty sample. The SIZE variable works with the certainty option CERTSIZE like this: we give the largest values of the multiplier to the records to be sampled for certain, and we use the CERTSIZE option to tell the system what that cut-off will be. So let's tack that extra bit on: if income > 100000 then mult = 20; else if income > 10000 then mult = 10; else mult = 1; Or we use our little Boolean trick again: mult = 1 + 9*(income > 10000) + 10*(income > 100000); Now we can do both the certainty sampling *and* the weighted sampling together: proc surveyselect data=YourBigData out=YourSample seed=40589584 method=pps certsize=20 sampsize=40000; size mult; run; Does that make more sense now? As for what I wrote before about certainty sampling, I'm copying the URL so you can find it in the SAS-L archives. http://listserv.uga.edu/cgi-bin/wa?A2=ind0603C&L=sas-l&P=R28796 HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

