Date: Thu, 4 Dec 2003 11:20:12 -0700
Reply-To: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Subject: Re: THANKS***PROBLEM USING RANUNI AND FLOOR FUNCTIONTOCREATEFAKE
DATA***
Content-Type: text/plain; charset=us-ascii
Apparently I was not clear. It's not the ability to have either
separate random number streams or a single stream that I think is
arbitrary. It's the choice of syntax that I think is arbitary. Why
does CALL RANUNI(x, y) result in multiple streams while x = RANUNI(y)
result in a single stream? Why not the other way around? Why not CALL
RANUNIS and CALL RANUNIM for single or multiple streams?
There might be some reasons hidden deep in the underlying structure of
SAS, but they're not obvious from the point of view of a new user - or
even an old and jaded user like me.
And once a new user sees that apparently random language choice, why
shouldn't s/he assume that there are other arbitrary choices in the same
area? Indeed, there's some evidence from new user questions on this
list that many people do think of the data step as capricious.
>Consequently, I think the decision is only arbitrary when you choose
to
>disregard the affect that choice will have on the quality of language
being
>developed. Fortunately the SAS DATA step is rather old and most of
the
>decisions were made in a less arbitrary way than I think is done
today.
I agree that some new features in the data step seem not to fit the
same mental model as older features. I think it is caused by the change
from PL/I as the implementation language to C as the implementation
language. The mindset just isn't the same.
>On the question of producing easily used and understood documentation,
I
>repeat a suggestion that I made several years ago on SAS-L. Replace
the SAS
>bowl at SUGI with teams from the SAS Institute. The audience would
write
>questions to locate some piece of information in the documentation.
The
>first team to locate the required information and prove that they
found it
>without using knowledge of the answer wins the question.
I suspect that would be rather boring to watch, as there would be many
minutes spent watching people search web pages before giving up in
defeat.
I don't think a decision has been made whether to have a SAS Bowl at
the next SUGI. Feel free to write up a proposal and send it to the
SAS-L BOF Committee. I suspect that SAS Institute would decline to
participate, as there's too much likelihood of being embarassed.
--
JackHamilton@FirstHealth.com
Manager, Technical Development
Metrics Department, First Health
West Sacramento, California USA
>>> "Ian Whitlock" <WHITLOI1@WESTAT.com> 12/03/2003 11:59 AM >>>
Jack,
The decision is not quite as arbitrary as you make it out. I think it
was
wisely made, but not so wisely documented in recent years. I think the
79
version of the manual was very clear about why one had both a function
and a
subroutine. As the language has grown in complexity, the documenters
have
lost sight of what and how communicate that language to the novice, in
part,
because of the volume of information that must be communicated.
Note that the common need is for one stream of random numbers, but
there is
also a need for the ability to provide separate streams. Now the
question
becomes how should those two objectives be achieved.
CALL RANUNI expects two arguments both of which must be variables.
Hence
the seed variable provides a place to store the next seed value. The
function has only one argument. Hence the next seed has to be stored
separately.
One could decide that each physical function call has it's own seed
sequence, but that could make writing programs correctly much more
awkward
because the common desire to use one stream would now require the CALL
form
or links to a subroutine and tripping over variable names. Consider:
data w ;
do obs = 1 to 50 ;
x = ranuni ( 12345 ) ;
y = ranuni ( 12345 ) ;
output ;
end ;
run ;
How often do you think one would write code like this to mean X and Y
should
have values coming from independent streams?
Alternatively, the function could require a second argument having a
seed
variable. In this case, one would have to break the common rule that
the
arguments of function are not changed by a call to the function, and
again
the common situation would now require more complex code.
Consequently, I think the decision is only arbitrary when you choose
to
disregard the affect that choice will have on the quality of language
being
developed. Fortunately the SAS DATA step is rather old and most of
the
decisions were made in a less arbitrary way than I think is done today.
On the question of producing easily used and understood documentation,
I
repeat a suggestion that I made several years ago on SAS-L. Replace
the SAS
bowl at SUGI with teams from the SAS Institute. The audience would
write
questions to locate some piece of information in the documentation.
The
first team to locate the required information and prove that they found
it
without using knowledge of the answer wins the question. The team
that
accumulates the most wins, wins the match. I had a math teacher in
college
who offered a dollar to any student who could find a mistake in his
calculus
text. By the time I took the class there weren't any mistakes to be
found.
If the SAS Institute offered ten dollars (college tuition was then $400
per
year) to each person providing a request where the information could
not be
located in the required time, then I suspect that within a few years
one
would see a considerable improvement in documentation.
IanWhitlock@westat.com
-----Original Message-----
From: Jack Hamilton [mailto:JackHamilton@FIRSTHEALTH.COM]
Sent: Wednesday, December 03, 2003 12:53 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: THANKS***PROBLEM USING RANUNI AND FLOOR FUNCTION
TOCREATEFAKE
DATA***
John Whittington wrote:
>>In the case given below, there aren't two different seeds, there's
only
>>one (the constant 12345), so it's not obvious that using one seed
twice
>>will have the same effect as using two different seeds. The
documentation
>>is technically correct, but it's misleading.
>
>Actually, I don't think it's really all that misleading. The online
>documentation you quote makes it clear that the function can't create
more
>than one stream in one DATA step by having different seeds. If it
can't
>produce more than one stream even when the seeds are different (when
one
>might possibly expect multiple streams), one surely would not expect
that
>it would produce multiple streams when the seeds were the same, would
one?
Yes, what you say makes sense, but that's because we already know how
the
streams work. If you're a new user, how are you supposed to interpret
the
documentation? Especially if you've already learned that sometimes
there's
one stream and sometimes there are multiple streams, depending on how
the
RANUNI function is called!
SAS's choice that CALL RANUNI uses multiple streams while RUNUNI as a
function uses only one stream is essentially arbitrary; how is someone
supposed to know when arbitrary rules are being applied and when
they're
not?
--
JackHamilton@FirstHealth.com
Manager, Technical Development
Metrics Department, First Health
West Sacramento, California USA
>>> "John Whittington" <John.W@mediscience.co.uk> 12/02/2003 8:39 PM
>>>
At 15:11 02/12/03 -0700, Jack Hamilton wrote:
>Actually, the CALL RANUNI online documentation refers one to the
"Seed
>values" documentation - so the needed information is three levels
down,
>and not stated directly even there.
I must confess that I was looking at the v6 hardcopy Language
Reference,
which is my first line of reference for most things!
>The documentation says "If you supply a different seed value to
>initialize each of the seed variables, the streams of the generated
>random numbers are computationally independent. With a function,
>however, you cannot generate more than one stream by supplying
multiple
>seeds within a DATA step. The following two examples illustrate the
>difference.".
Yes, that's slightly less clear than the V6 documentation I quoted,
which
said:
"".... The CALL RANUNI statement produces a separate stream for each
seed,
while the RANUNI function produces only a single stream of random
variates,
even with multiple RANUNI function occurrences in the same DATA step."
As you'll see, although there was still a bit of potential confusion,
in
that the first pasrt of the sentence refers to multiple seeds with
CALL
RANUNI, the second part of the sentence is very clear in saying that
the
RANUNI function can produce only a single stream, even if there are
multiple
occurrences of the function (which, I would say, by implication, means
regardless of whether the seeds are the same or different).
>In the case given below, there aren't two different seeds, there's
only
>one (the constant 12345), so it's not obvious that using one seed
twice
>will have the same effect as using two different seeds. The
documentation
>is technically correct, but it's misleading.
Actually, I don't think it's really all that misleading. The online
documentation you quote makes it clear that the function can't create
more
than one stream in one DATA step by having different seeds. If it
can't
produce more than one stream even when the seeds are different (when
one
might possibly expect multiple streams), one surely would not expect
that it
would produce multiple streams when the seeds were the same, would
one?
Kind Regards
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
Buckingham MK18 4EL, UK mediscience@compuserve.com
----------------------------------------------------------------