Date: Mon, 24 Mar 2003 09:41:50 -0500
Reply-To: "Gerstle, John" <yzg9@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Gerstle, John" <yzg9@CDC.GOV>
Subject: Re: Help with making SQL/Data Step More Efficient
Content-Type: text/plain
Dale,
The "later sampling analyses", as I've succinctly put it, will be using the
straight forward sampling and PROC SURVEYSELECT. But I would still have to
create the large dataset.
Are you suggesting that it would be wiser to NOT create/save smaller
datasets, but just save the large one? I can easily add an indicator
variable that would distinguish the groups to sample for use later.
BTW...thanks for the front-line reporting. I'm quite excited to visit your
fair city. My brother has mentioned that it's my kind of town. Coffee, mmm
good.
John Gerstle
CDC Information Technological Support Contract (CITS)
Biostatistician
>> -----Original Message-----
>> From: David L. Cassell [mailto:cassell.david@EPAMAIL.EPA.GOV]
>> Sent: Friday, March 21, 2003 8:10 PM
>> To: SAS-L@LISTSERV.UGA.EDU
>> Subject: Re: Help with making SQL/Data Step More Efficient
>>
>> "Gerstle, John" <yzg9@CDC.GOV> replied [in part]:
>> > and BS. The following data step now creates the multitude of datasets
>> needed
>> > for later sampling analyses, so only one pass through the data is
>> needed.
>>
>> Hmmmm. "Later sampling analyses".
>>
>> I wonder if there is a simplification or two that could
>> be introduced into your process if we knew how the 'sampling'
>> part later was to be done. It might be that you don't need
>> to split the data into four data sets. Or that you don't need
>> that time-consuming sort you complained about.
>>
>> If you just want four data sets so you can do four separate
>> probability samples of the data, then you might want to consider
>> some simplifications, including leaving everything in one
>> data set and using PROC SURVEYSELECT to do stratified sampling
>> instead. Or taking a sample first, then examining which categories
>> your sample records fell into. Or...
>>
>> David
>> --
>> David Cassell, CSC
>> Cassell.David@epa.gov
>> Senior computing specialist
>> mathematical statistician
|