Date: Sun, 17 Jun 2007 22:34:13 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Randomly picking a procedurecode for each doctor
In-Reply-To: <7367b4e20706150829k500fbdeah5d957dccce2bb2ed@mail.gmail.com>
Content-Type: text/plain; format=flowed
datanull@GMAIL.COM sagely replied:
>
>I think, you want to use STRATA DOCID and N=1. If I understand correctly.
>
>data work.docs;
> input docid:$6. procedurecode:$5. @@;
> cards;
>001111 21740 001111 21740 001111 21740 001111 21740 001111 21740
>001111 20680 001111 20680 001112 50240 001112 50240 001112 50240
>001112 52601 001112 52601 001112 55845 001113 48140 001113 48140
>001113 48140 001113 48140 001113 48140 001113 48150 001113 48150
>001113 47130 001113 47135 001114 53500 001114 53500 001114 53500
>001114 53450 001114 53450 001115 21045 001115 21045 001115 21045
>001115 21040 001115 21040
>;;;;
> run;
>proc surveyselect
> seed=20062001
> data=docs
> method=SRS
> n=1
> out=surgerycase1;
> strata docid;
> run;
>proc print;
> run;
>
>
>On 6/15/07, Annie Lee <hummingbird10111@hotmail.com> wrote:
>>Hi,
>>
>>I have a data set with doctor's id (multiple records for each doctor) and
>>procedurecode.
>>
>>This time, I would like to pick one randomly selected procedurecode for
>>each (unique) doctorid in order to avoid any bias in selecting
>>procedurecode.
>>
>>I tried doing this using proc surveyselect.
>>one question I have regarding using proc surveyselect-- is there a way not
>>to specify the sampsize in advance?
>>In the future, I will have different number of records every quarter in
>>this data set and I do not want to calculate the unique number of docid
>>beforehand to assign the sampsize manually.
>>
>>I would appreciate any help. Thank you. -Eunice
>>
>>
>>
>>proc surveyselect data = surgerycase method = SRS rep = 1
>> sampsize = ?? out = surgerycase1;
>>id _all_;
>>run;
>>
>>
>>
>>Results I would like to have:
>>
>>docid procedurecode (it is randomly picked so it could be any
>>procedurecode)
>>
>>001111 21740
>>001112 50240
>>001113 48140
>>001114 53500
>>001115 21045
>>
>>data set:
>>
>>docid procedurecode
>>
>>001111 21740
>>001111 21740
>>001111 21740
>>001111 21740
>>001111 21740
>>001111 20680
>>001111 20680
>>001112 50240
>>001112 50240
>>001112 50240
>>001112 52601
>>001112 52601
>>001112 55845
>>001113 48140
>>001113 48140
>>001113 48140
>>001113 48140
>>001113 48140
>>001113 48150
>>001113 48150
>>001113 47130
>>001113 47135
>>001114 53500
>>001114 53500
>>001114 53500
>>001114 53450
>>001114 53450
>>001115 21045
>>001115 21045
>>001115 21045
>>001115 21040
>>001115 21040
>>
D0 has supplied the solution I had in mind.
I just want to add one thing. Because of his construction, the input
data set is *already* sorted on DOCID, the stratum variable. If your
data are not yet sorted (or indexed) on DOCID, you would need to
sort/index on this variable first. Otherwise, the proc will complain.
A lot.
When doing stratified sampling, PROC SURVEYSELECT uses the
N= option to tell how many records to pick in each stratum, rather
than for the whole data set. So N=1 is exactly what you asked for.
It may *not* be what you should really be thinking about...
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Don’t miss your chance to WIN $10,000 and other great prizes from Microsoft
Office Live http://clk.atdmt.com/MRT/go/aub0540003042mrt/direct/01/
|