Date: Mon, 24 Mar 2003 09:41:50 -0500
Reply-To: "Gerstle, John" <yzg9@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Gerstle, John" <yzg9@CDC.GOV>
Subject: Re: Help with making SQL/Data Step More Efficient
The "later sampling analyses", as I've succinctly put it, will be using the
straight forward sampling and PROC SURVEYSELECT. But I would still have to
create the large dataset.
Are you suggesting that it would be wiser to NOT create/save smaller
datasets, but just save the large one? I can easily add an indicator
variable that would distinguish the groups to sample for use later.
BTW...thanks for the front-line reporting. I'm quite excited to visit your
fair city. My brother has mentioned that it's my kind of town. Coffee, mmm
CDC Information Technological Support Contract (CITS)
>> -----Original Message-----
>> From: David L. Cassell [mailto:cassell.david@EPAMAIL.EPA.GOV]
>> Sent: Friday, March 21, 2003 8:10 PM
>> To: SAS-L@LISTSERV.UGA.EDU
>> Subject: Re: Help with making SQL/Data Step More Efficient
>> "Gerstle, John" <yzg9@CDC.GOV> replied [in part]:
>> > and BS. The following data step now creates the multitude of datasets
>> > for later sampling analyses, so only one pass through the data is
>> Hmmmm. "Later sampling analyses".
>> I wonder if there is a simplification or two that could
>> be introduced into your process if we knew how the 'sampling'
>> part later was to be done. It might be that you don't need
>> to split the data into four data sets. Or that you don't need
>> that time-consuming sort you complained about.
>> If you just want four data sets so you can do four separate
>> probability samples of the data, then you might want to consider
>> some simplifications, including leaving everything in one
>> data set and using PROC SURVEYSELECT to do stratified sampling
>> instead. Or taking a sample first, then examining which categories
>> your sample records fell into. Or...
>> David Cassell, CSC
>> Senior computing specialist
>> mathematical statistician