Date: Tue, 18 Jan 2005 11:26:37 -0800
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: surveyselect question
In-Reply-To: <70A9A1413B47DD4FB1F311A7C19B218C038B1DFA@m-nccd-1.nccd.cdc.gov>
Content-type: text/plain; charset=US-ASCII
"Zack, Matthew M." <MMZ1@CDC.GOV> replied:
> I don't know because I'm not an expert in PROC SURVEYSELECT (another
> SAS-L member, David Cassell, is such an expert).
Flattery will get you nowhere. :-)
> PROC SURVEYSELECT will allow you to select a simple random sample
> (option, METHOD=SRS)
> of patients of a specified size (N=13) or a specified proportion
> (SAMPRATE=0.05), but
> it will do so only after you collapse across visits so that every
> patient is represented
> only once. Otherwise, PROC SURVEYSELECT may not select all visits of
a
> specific selected
> patient. You would then have to merge the selected sample of patients
> with the original sample
> by patient ID to retrieve all the visits associated with each patient.
Exactly right. As an aside, let me point out that PROC SURVEYSELECT is
quite happy to let you specify the sample size OR the sampling rate, but
NOT both at the same time.
The sampling process should be applied to the list of patients, not
the list of patient visits. If you want to use PROC SURVEYSELECT, you
would indeed have to merge the visit info back in afterward.
> I don't think PROC SURVEYSELECT by itself can select observations with
> missing values
> of either SBP or DBP. You would have to use a prior DATA step or PROC
> APPEND to select observations with these characteristics before PROC
> SURVEYSELECT would randomly select
> a sample of such observations. Then, you would have to concatenate
> these sampled
> observations after the previously randomly selected persons.
Yes. (Hey, who says you're not an expert in PROC SURVEYSELECT?) If
If were doing this with PROC SURVEYSELECT, I would use a prior DATA
step or SQL step to create a separate data set of those records which
had either SBP or DBP missing, and then select from that.
> I don't know whether your macro, my prior program, or PROC
SURVEYSELECT
> would be better
> (more efficient, easier to understand, etc.) in terms of your specific
> application.
I think that Matthew's code would be more efficient than running
everything through PROC SURVEYSELECT. It's a useful tool, but it
won't solve all known problems. Feel free to use that screwdriver
instead of a hammer, when your fastener turns out to be a screw
instead of a nail. :-)
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician