|
hellangel_987@YAHOO.COM wrote:
>I'd like to present my question in a much simplified
>way:
>
>I have a data set with variable x and weight:
>
>x weight
>1 0.3
>2 0.2
>3 0.4
>4 0.07
>5 0.03
>
>(note that sum of the weight is 1)
Useful. It would be niftier if you wrote it as a little data step, but it's
useful as is.
>I need to take a sample of size, say 2, from x, but
>not random sampling: x is to be selected with
>probability of the corresponding weight, for example,
>with probability 0.3, x=1 is selected.
I would say this IS random sampling. Just not simple random sampling. It
is
often called PPS (probability proportional to size) sampling, and you can
find it
as such in the PROC SURVEYSELECT docs.
All you need is the SIZE statement and a 'multiplier' variable, which here
is
just your variable which you called weight.. which I find disturbingly
named,
because the sampling weights will come out related to the inverses of
these values, and the confusion gets pretty icky. So, as Dan already
pointed
out,
proc surveyselect data=your data out=yoursample
seed = 49584843 /* optional: SAS will make one up for you if you want
*/
sampsize = 2
method = pps /* also optional: the default with SIZE statement */
;
size yourweightthingy;
run;
So you could just say:
proc surveyselect data=your data out=yoursample sampsize=2;
size yourweightthingy;
run;
>I am very frustrated that while this job can be done
>in R as easily as one line statement
>sample(x,2,prob=w),
>SAS technicians couldn't even understand my question
>after more than one follow-up. Every time they
>directed me to PROC SURVEYSELECT using options
> METHOD=PPS. I have read through the SAS manual
>carefully, I am positive this is not the same issue.
>It seems to me they've never heard of any sampling
>scheme other than simple random sample.
Sorry, but they were right. Cryptic and unintentionally unhelpful, but
right.
>As a graduate student at Statistics department, I had
>been enjoying writing codes in R for 3 years mainly
>because that almost all academic professionals prefer
>R to SAS, believing the former a research-oriented
>programming language while the latter merely a tool to
>manipulate data. But I myself have turned back to SAS
>as I found that R is easily broken down with moderate
>to large size data while SAS has the amazing capacity
>and efficiency of huge data prosessing which
>insurmountable by R. I have ever since implemented all
>of our algorithms in SAS. But the lack of flexibility,
>the lack of easy-used functions of SAS have made my
>life increasing difficult as our research project
>getting more and more complicated.
Yes, you are right about R. And not just R, either. Sometimes you can keep
going in R if you can write some of your 'helper' functions in C to get
around
the speed issues.
As for re-writing R code in SAS:
Sometimes it takes more than a re-write, but a re-design as well. If you
have
listened to me blather about bootstrapping, then you know that it is really
easy to do in SAS. But the approach is totally different from that in R.
So in a lot of cases, you have to think in a different paradigm.
>Thank you for lisening to my blah blah.
Well, you have to put up with mine, so why not? :-) :-)
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
|