```Date: Sun, 16 Jan 2005 14:44:06 -0500 Reply-To: "Zack, Matthew M." Sender: "SAS(r) Discussion" From: "Zack, Matthew M." Subject: Re: surveyselect question Content-Type: text/plain; charset="us-ascii" What if you randomly select patients and all their visits without PROC SURVEYSELECT? * Sort patient visits; * by patient ID; proc sort; by pt; run; * Generate a uniform random number for each patient; data two(drop=rnseed); retain rnseed 6093141 rn; set; by pt; if (first.pt eq 1) then rn=uniform(rnseed); output two; run; * Sort patient visits; * by patient ID, visit ID, and ascending uniform random number; proc sort data=two; by pt visit rn; run; * Select about 50 [+- 2 visits so that range=48 to 52] total patient visits; * Add five visits (possibly from different patients) after the 50 above are selected; * where SBP=. or DBP=.; data visit50(drop=rn lstvisit nmissbp); retain lstvisit nmissbp 0; set two; by pt visit; select; when (lstvisit eq 0) do; if ((ABS(50-_n_) le 2) and (last.pt eq 1)) then lstvisit=1; output visit50; end; when (lstvisit eq 1) do; if ((sbp eq .) or (dbp eq .)) then do; nmissbp=nmissbp+1; if (nmissbp le 5) then output visit50; else lstvisit=2; end; end; otherwise stop; end; run; * Select about 20% of the input data set; * Add five visits (possibly from different patients) after the above 20% are selected; * where SBP=. or DBP=.; data visit20p(drop=rn nmissbp); retain nmissbp 0; set two; by pt visit; select; when (rn le 0.20) output visit20p; otherwise do; if ((sbp eq .) or (dbp eq .)) then do; nmissbp=nmissbp+1; if (nmissbp le 5) then output visit20p; else stop; end; end; end; run; Matthew Zack -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Scott Sent: Sunday, January 16, 2005 1:12 AM To: SAS-L@LISTSERV.UGA.EDU Subject: surveyselect question Hi, I've read various posts about SURVEYSELECT and random samples in the archives, but couldn't find the answer to my problem, thus this post... Say I have a dataset: PT VISIT SBP DBP, where PT = patient VISIT = visit number, say 1 - 4, which may be incomplete for a given PT, i.e. could be 1; 1,2,4; 1,2,3; 1,3; etc. SBP = systolic blood pressure DBP = diastolic blood pressure (both BP's could have missing values) I'd like to sample this dataset as follows: 1. Sample has "around" say 50 observations in total. 2. Sample has say 20% of observations from input data set. In both of these samples, *** ALL observations for a given PT are included ***, i.e. if PT 7 is one of the patients randomly selected, then all visits for that PT are included in the random sample. 3. #1 and #2 above, augmented by say 5 random observations where either SBP, DBP, or both have a missing value. For #3, I don't care if I make two passes over the data, but one pass would be nice. IOW, in "pseudocode": 1. If each PT had 4 visit records, I would have either 12 (48) or 13 (52) observations in the sample dataset, since I specified a sample size of around 50. 2. If each PT had 4 visit records, and the total input dataset is 1000 observations, I would have 200 observations in the sample dataset, comprised of 50 PTs with 4 visits each. 3.(1) 12 or 13 random patients, plus 5 observations where SBP, DBP, or both were missing. 3.(2) 50 random patients, plus 5 observations where SBP, DBP, or both were missing. I've played with SURVEYSELECT, but can't figure out how to get all records for a given PT to be included in the output. Note that this sampling is for QC tests of code algorithms, not for further statistical analyses of the resulting sample. Thanks, Scott ```

Back to: Top of message | Previous page | Main SAS-L page