Date: Sun, 12 Feb 2006 22:23:17 -0800 David L Cassell "SAS(r) Discussion" David L Cassell Re: sampling based on weighted probability <200602101915.k1AINhVH031234@mailgw.cc.uga.edu> text/plain; format=flowed

hellangel_987@YAHOO.COM wrote: >I'd like to present my question in a much simplified >way: > >I have a data set with variable x and weight: > >x weight >1 0.3 >2 0.2 >3 0.4 >4 0.07 >5 0.03 > >(note that sum of the weight is 1)

Useful. It would be niftier if you wrote it as a little data step, but it's useful as is.

>I need to take a sample of size, say 2, from x, but >not random sampling: x is to be selected with >probability of the corresponding weight, for example, >with probability 0.3, x=1 is selected.

I would say this IS random sampling. Just not simple random sampling. It is often called PPS (probability proportional to size) sampling, and you can find it as such in the PROC SURVEYSELECT docs.

All you need is the SIZE statement and a 'multiplier' variable, which here is just your variable which you called weight.. which I find disturbingly named, because the sampling weights will come out related to the inverses of these values, and the confusion gets pretty icky. So, as Dan already pointed out,

proc surveyselect data=your data out=yoursample seed = 49584843 /* optional: SAS will make one up for you if you want */ sampsize = 2 method = pps /* also optional: the default with SIZE statement */ ; size yourweightthingy; run;

So you could just say:

proc surveyselect data=your data out=yoursample sampsize=2; size yourweightthingy; run;

>I am very frustrated that while this job can be done >in R as easily as one line statement >sample(x,2,prob=w), >SAS technicians couldn't even understand my question >after more than one follow-up. Every time they >directed me to PROC SURVEYSELECT using options > METHOD=PPS. I have read through the SAS manual >carefully, I am positive this is not the same issue. >It seems to me they've never heard of any sampling >scheme other than simple random sample.

Sorry, but they were right. Cryptic and unintentionally unhelpful, but right.

>As a graduate student at Statistics department, I had >been enjoying writing codes in R for 3 years mainly >because that almost all academic professionals prefer >R to SAS, believing the former a research-oriented >programming language while the latter merely a tool to >manipulate data. But I myself have turned back to SAS >as I found that R is easily broken down with moderate >to large size data while SAS has the amazing capacity >and efficiency of huge data prosessing which >insurmountable by R. I have ever since implemented all >of our algorithms in SAS. But the lack of flexibility, >the lack of easy-used functions of SAS have made my >life increasing difficult as our research project >getting more and more complicated.

Yes, you are right about R. And not just R, either. Sometimes you can keep going in R if you can write some of your 'helper' functions in C to get around the speed issues.

As for re-writing R code in SAS:

Sometimes it takes more than a re-write, but a re-design as well. If you have listened to me blather about bootstrapping, then you know that it is really easy to do in SAS. But the approach is totally different from that in R. So in a lot of cases, you have to think in a different paradigm.

>Thank you for lisening to my blah blah.

Well, you have to put up with mine, so why not? :-) :-)

David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Back to: Top of message | Previous page | Main SAS-L page