|
Ian,
you expressed my sentiments exactly. I do appreciate the effort of the
site owner, but I found very discomforting the fact that the disclaimer
about software accuracy was in a smaller font and in a subtle color.
(Flashing Bright Red would be appropriate here.) A lot of people may get
burned by either inaccurate/incorrect code on the site, or naive/lazy "plug
and chug" use of the code. SI's apprehension may be justified. Thanks Ian.
Regards,
Mark DeHaan
WHITLOI1 <WHITLOI1@WESTAT.COM>@LISTSERV.VT.EDU> on 12/10/99 07:21:16 AM
Please respond to WHITLOI1 <WHITLOI1@WESTAT.COM>
Sent by: "SAS(r) Discussion" <SAS-L@LISTSERV.VT.EDU>
To: SAS-L@LISTSERV.VT.EDU
cc:
Subject: Re: a random sample. I published 2 macro program ...
Subject: Re: a random sample. I published 2 macro program ...
Summary: Problems with the code.
Respondent: Ian Whitlock <whitloi1@westat.com>
Renaud Harduin <r.harduin@ABS-TECHNOLOGIES.COM> offered two programs on a
popular subject - drawing random samples. He wrote
> Go to the www.SAShelp.com web site, I published 2 macro program :
>
> %ECH_SPLE : simple random sample (optimized in I/O, MEM and CPU)
> with distinct observation ==> Efficency
> %ECH_ALEA : Make a stratified random sample but requires more I/O
> and CPU
I looked at the first program and found the following problems:
1) For any two "random" samples from a given data set generated
by this program, the larger sample will contain the smaller
sample. For example the code,
data w ; do s = 1 to 100 ; output ; end ; run ;
%ech_sple ( data = w , out = s10 , size = 10 )
%ech_sple ( data = w , out = s23 , size = 23 )
proc compare data = s10 compare = s23 ( obs = 10 ) ; run ;
produced a report with no differences found.
2) The variables I, J, and DSID are on the output sample.
3) The variable X cannot be on the input data set.
4) The last record can never be in the sample.
5) The probability of choosing the 0th obs (there isn't any)
is 1/sample_size.
6) The number of logical obs is referenced but the program can
produce incorrect result for every logically missing
observation.
7) Duplicate choices must be eliminated in a subsequent step.
8) On efficiency - a nonworking linear search was used.
I didn't look at the second macro.
The site itself is impressive although I did get a glimmer of why the
SAS Institute objects to sites using the SAS name. It is unfortunate
that the quality of the programs is not monitored. This does not mean
the other 93 tip/programs have the same quality, I didn't look at them.
I can go along with the SAS-L rational that discussion must be free
and open, hence code posted need not work. In this context the
reader has a clear warning. But I find it frightening, to see a
professional looking web site without any monitoring of the quality
of posted programs.
Ian Whitlock
|