LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (December 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 Dec 2003 11:20:12 -0700
Reply-To:     Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Subject:      Re: THANKS***PROBLEM USING RANUNI AND FLOOR FUNCTIONTOCREATEFAKE
              DATA***
Comments: To: WHITLOI1@WESTAT.com
Content-Type: text/plain; charset=us-ascii

Apparently I was not clear. It's not the ability to have either separate random number streams or a single stream that I think is arbitrary. It's the choice of syntax that I think is arbitary. Why does CALL RANUNI(x, y) result in multiple streams while x = RANUNI(y) result in a single stream? Why not the other way around? Why not CALL RANUNIS and CALL RANUNIM for single or multiple streams?

There might be some reasons hidden deep in the underlying structure of SAS, but they're not obvious from the point of view of a new user - or even an old and jaded user like me.

And once a new user sees that apparently random language choice, why shouldn't s/he assume that there are other arbitrary choices in the same area? Indeed, there's some evidence from new user questions on this list that many people do think of the data step as capricious.

>Consequently, I think the decision is only arbitrary when you choose to >disregard the affect that choice will have on the quality of language being >developed. Fortunately the SAS DATA step is rather old and most of the >decisions were made in a less arbitrary way than I think is done today.

I agree that some new features in the data step seem not to fit the same mental model as older features. I think it is caused by the change from PL/I as the implementation language to C as the implementation language. The mindset just isn't the same.

>On the question of producing easily used and understood documentation, I >repeat a suggestion that I made several years ago on SAS-L. Replace the SAS >bowl at SUGI with teams from the SAS Institute. The audience would write >questions to locate some piece of information in the documentation. The >first team to locate the required information and prove that they found it >without using knowledge of the answer wins the question.

I suspect that would be rather boring to watch, as there would be many minutes spent watching people search web pages before giving up in defeat.

I don't think a decision has been made whether to have a SAS Bowl at the next SUGI. Feel free to write up a proposal and send it to the SAS-L BOF Committee. I suspect that SAS Institute would decline to participate, as there's too much likelihood of being embarassed.

-- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA

>>> "Ian Whitlock" <WHITLOI1@WESTAT.com> 12/03/2003 11:59 AM >>> Jack,

The decision is not quite as arbitrary as you make it out. I think it was wisely made, but not so wisely documented in recent years. I think the 79 version of the manual was very clear about why one had both a function and a subroutine. As the language has grown in complexity, the documenters have lost sight of what and how communicate that language to the novice, in part, because of the volume of information that must be communicated.

Note that the common need is for one stream of random numbers, but there is also a need for the ability to provide separate streams. Now the question becomes how should those two objectives be achieved.

CALL RANUNI expects two arguments both of which must be variables. Hence the seed variable provides a place to store the next seed value. The function has only one argument. Hence the next seed has to be stored separately.

One could decide that each physical function call has it's own seed sequence, but that could make writing programs correctly much more awkward because the common desire to use one stream would now require the CALL form or links to a subroutine and tripping over variable names. Consider:

data w ; do obs = 1 to 50 ; x = ranuni ( 12345 ) ; y = ranuni ( 12345 ) ; output ; end ; run ;

How often do you think one would write code like this to mean X and Y should have values coming from independent streams?

Alternatively, the function could require a second argument having a seed variable. In this case, one would have to break the common rule that the arguments of function are not changed by a call to the function, and again the common situation would now require more complex code.

Consequently, I think the decision is only arbitrary when you choose to disregard the affect that choice will have on the quality of language being developed. Fortunately the SAS DATA step is rather old and most of the decisions were made in a less arbitrary way than I think is done today.

On the question of producing easily used and understood documentation, I repeat a suggestion that I made several years ago on SAS-L. Replace the SAS bowl at SUGI with teams from the SAS Institute. The audience would write questions to locate some piece of information in the documentation. The first team to locate the required information and prove that they found it without using knowledge of the answer wins the question. The team that accumulates the most wins, wins the match. I had a math teacher in college who offered a dollar to any student who could find a mistake in his calculus text. By the time I took the class there weren't any mistakes to be found. If the SAS Institute offered ten dollars (college tuition was then $400 per year) to each person providing a request where the information could not be located in the required time, then I suspect that within a few years one would see a considerable improvement in documentation.

IanWhitlock@westat.com

-----Original Message----- From: Jack Hamilton [mailto:JackHamilton@FIRSTHEALTH.COM] Sent: Wednesday, December 03, 2003 12:53 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: THANKS***PROBLEM USING RANUNI AND FLOOR FUNCTION TOCREATEFAKE DATA***

John Whittington wrote:

>>In the case given below, there aren't two different seeds, there's only >>one (the constant 12345), so it's not obvious that using one seed twice >>will have the same effect as using two different seeds. The documentation >>is technically correct, but it's misleading. > >Actually, I don't think it's really all that misleading. The online >documentation you quote makes it clear that the function can't create more >than one stream in one DATA step by having different seeds. If it can't >produce more than one stream even when the seeds are different (when one >might possibly expect multiple streams), one surely would not expect that >it would produce multiple streams when the seeds were the same, would one?

Yes, what you say makes sense, but that's because we already know how the streams work. If you're a new user, how are you supposed to interpret the documentation? Especially if you've already learned that sometimes there's one stream and sometimes there are multiple streams, depending on how the RANUNI function is called!

SAS's choice that CALL RANUNI uses multiple streams while RUNUNI as a function uses only one stream is essentially arbitrary; how is someone supposed to know when arbitrary rules are being applied and when they're not?

-- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA

>>> "John Whittington" <John.W@mediscience.co.uk> 12/02/2003 8:39 PM >>> At 15:11 02/12/03 -0700, Jack Hamilton wrote:

>Actually, the CALL RANUNI online documentation refers one to the "Seed >values" documentation - so the needed information is three levels down, >and not stated directly even there.

I must confess that I was looking at the v6 hardcopy Language Reference, which is my first line of reference for most things!

>The documentation says "If you supply a different seed value to >initialize each of the seed variables, the streams of the generated >random numbers are computationally independent. With a function, >however, you cannot generate more than one stream by supplying multiple >seeds within a DATA step. The following two examples illustrate the >difference.".

Yes, that's slightly less clear than the V6 documentation I quoted, which said:

"".... The CALL RANUNI statement produces a separate stream for each seed, while the RANUNI function produces only a single stream of random variates, even with multiple RANUNI function occurrences in the same DATA step."

As you'll see, although there was still a bit of potential confusion, in that the first pasrt of the sentence refers to multiple seeds with CALL

RANUNI, the second part of the sentence is very clear in saying that the RANUNI function can produce only a single stream, even if there are multiple occurrences of the function (which, I would say, by implication, means regardless of whether the seeds are the same or different).

>In the case given below, there aren't two different seeds, there's only >one (the constant 12345), so it's not obvious that using one seed twice >will have the same effect as using two different seeds. The documentation >is technically correct, but it's misleading.

Actually, I don't think it's really all that misleading. The online documentation you quote makes it clear that the function can't create more than one stream in one DATA step by having different seeds. If it can't produce more than one stream even when the seeds are different (when one might possibly expect multiple streams), one surely would not expect that it would produce multiple streams when the seeds were the same, would one?

Kind Regards

John

---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk Buckingham MK18 4EL, UK mediscience@compuserve.com ----------------------------------------------------------------


Back to: Top of message | Previous page | Main SAS-L page