Date: Mon, 24 Mar 2003 15:37:07 -0500
Reply-To: "Gerstle, John" <yzg9@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Gerstle, John" <yzg9@CDC.GOV>
Subject: Re: Help with making SQL/Data Step More Efficient
Are there any local coffee breweries that are NOT Starbucks? I lived in New
Orleans long enough not to want to pay more than $2 for a large coffee -
good coffee. Little spoiled am I.
Anyway, the need for creating the large dataset is the basis of the project.
Each record is a case for each site. After we determine rates of agreement
between the stateno and birthsex variables for each site, we will sample the
'quadrants', for lack of a better term, and create lists of cases for the
sites to check over - fix mistakes made prior/during data entry. So the
sampling is needed to pull certain cases that meet specifications outlined
in the creation of the 'quadrants'. Have to create the large dataset at
least once. I had questioned that need for using ALL records/cases (this
includes all those cases that do NOT have a match which increases the number
of discordant pairs incredibly), but this is what is needed for this project
(an evaluation project to be a little more precise). Does that answer you
question? I don't see a way around creating the dataset at least once.
So, if all statisticians look alike, does that mean there will be a John
Malkovich-esque restaurant in Seattle this weekend?
CDC Information Technological Support Contract (CITS)
>> -----Original Message-----
>> From: David L. Cassell [mailto:cassell.david@EPAMAIL.EPA.GOV]
>> Sent: Monday, March 24, 2003 1:40 PM
>> To: SAS-L@LISTSERV.UGA.EDU
>> Subject: Re: Help with making SQL/Data Step More Efficient
>> "Gerstle, John" <yzg9@CDC.GOV> replied:
>> > Dale,
>> Oops, you appear to have confused me with MIXEDmaster McLerran. But
>> a common mistake. All us statisticians look alike. Like in the old
>> Patty Duke show, where Patty Duke played identical cousin statisticians.
>> Remember the old theme song? "...you could lose your mind, when
>> are two of a kind!" Hey, that's why, whenever we get together,
>> ensues. :-)
>> > The "later sampling analyses", as I've succinctly put it, will be
>> using the
>> > straight forward sampling and PROC SURVEYSELECT. But I would still
>> have to
>> > create the large dataset.
>> Okay, here's my question. Why? What is the reason you need the larger
>> Cartesian-product data set in order to do these sampling exercises? My
>> originla point was that, if you explicated more fully, we might be able
>> find a way out of the need for the Cartesian product.
>> > Are you suggesting that it would be wiser to NOT create/save smaller
>> > datasets, but just save the large one? I can easily add an indicator
>> > variable that would distinguish the groups to sample for use later.
>> Yes. If all you need is stratified sampling, then you can do that from
>> single data set with your 'indicator variable' serving as your stratum
>> variable. But I *still* would liek to hear why the Cartesian product is
>> needed for the sampling.
>> > BTW...thanks for the front-line reporting. I'm quite excited to visit
>> > fair city. My brother has mentioned that it's my kind of town.
>> Coffee, mmm
>> > good.
>> There's a Starbucks every thirty feet in Seattle (some sort of city
>> so you can't miss the coffee. In fact, if it's raining, just cut
>> through the
>> Starbucks stores, one after the other, until you reach your intended
>> There are enough of them now that they're nearly adjoining. I can't
>> wait to find
>> out whether there's a Starbucks in every meeting room at the convention
>> David Cassell, CSC
>> Senior computing specialist
>> mathematical statistician