|
Michelle Jellinghaus <michelle@EMODE.COM> replied to her own problem
with this code:
> In the event that this may be useful to others, I have included the
> beautifully simple solution below. It seems to work well for my
purposes :)
>
> proc sql;
> connect to oracle(user='xxxxx' pass='xxxxx' path='xxxxx');
> create table mysample as
> select * from connection to oracle
> (select *
> from mytable sample(.01) A1
> )
> ;
> quit;
>
> note: this query returns 0.01% of the records in this table, in random
> order.
Very nice. And let me say that I really like the way that you structure
your code for readability, which too many people forget. But...
Does it provide a simple random sample With Replacement,
or a simple random sample Without Replacement? This may matter,
depending on your usage.
Does it provide a repeatable process, so that you can substantiate
your work, or reproduce your sample when the big boss questions your
results?
If these things don't matter, then you have a great solution. If some
aspects of the sampling protocols do matter, then you need to think
about
some of these details.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|