LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 1998, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 21 Dec 1998 08:19:40 +0000
Reply-To:   Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Sender:   "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:   Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Subject:   Re: hashing, big formats,etc.
In-Reply-To:   <914199038.2121222.0@vm121.akh-wien.ac.at>

Hi Larry, I hope I'm not wasting your bandwith, but "there's always the delete key" ... I'd like to support the more technical articles, demonstrating SAS programming outside of the purely applications/user program area.

Where the straight forward "normal standard SAS program" works fine, there is *no need whatever* for exotic methods.

So tell me Larry (Technical Manager) in your 14 years exposure to SAS, that those "works fine" boundaries have *never* been crossed for you !

The procedures of the SAS System assume little, or no, pre-knowledge of the information in the data you analyse. You, or your business users, or even your application development SAS programmers, will probably recognise more of the simple facts about your data, than a "normal standard SAS" procedure - when it starts.

It is this pre-knowledge which enables the more exotic solutions to beat the pants off standard SAS.

Some examples might give you cause to be more supportive. 1 proc tabulate This provides fantastic formating and labeling support to cross-tab analyses. But have you ever applied it where the deepest cross is, say greater than 5 class vars, for which some have over 200 unique values (within the input dataset and others have say 50 - 100 unique values. This can't be untypical in the data warehousing analysis field. Bit I first suffered this problem in a data set of less than 5000 obs. I doubt that this problem is one you have been exposed to. It eats run-time *till next week*, once real memory has been exhausted. When offering a real problem to SI tech support, I usually have to distil out the business side for confidentiality and clarification. The problem which overwhelms tabulate, can be demo-ed with data reduced to 3 obs but using an increasing number of cross.classes. Let me know if you want to try this your self, to be aware of the limits of standard sas in just one context.

2 merging in a data step How often is a sort step used in a program when only a few obs are out of order? This may be crucial when the data volumes are large. But when the design results in this scenario, the normal standard sas program solution makes us *all* wait. It is not just (or even) the programmer who will wait, but the program user, and all other users of that processor which has been hogged by proc sort. When a "hardly complicated" design consideration would eliminate the problem before it arose. Because the designer of the system knows more about the information than a normal standard sas program.

3 lookup tables / hashing There are some enormous sets of data around, challenging the statisticans model solutions, many of which are historically based (score card model theory is over 40 years old ! ). Data volumes are higher now. That brings non-statistical problems. Computer systems have advanced enormously in even just the last third of that time (as you can testify). The expectation of users may have crossed from "over-eager anticipation" into the "reality of waiting", but competetive business decision information will always be needed "like yesterday, or even sooner". Sometimes, I just can't "wait a week" for the normal standard sas program to tagsort very large sets of data, even if I could obtain all the resources (disk space and processor power) needed. Model solutions involving table-lookup have been included among the sample programs offered by the SAS Institute with each installation. These demonstrate using arrays and formats for look-up tables as alternatives to data step merge. There are a great many other demos too.

I think it's not only the explosion of data volumes, but also the offer of "possible solutions" from competitors, which drives the search for faster solutions for large sets of data.

Another (final) reason I want technical programming theories and examples to feature on SAS-L, is as the antidote for questions for which the correct response might be RTFM

In article <914199038.2121222.0@vm121.akh-wien.ac.at>, david pider <dpider@HOTMAIL.COM> writes >I wonder if somebody else on the list is annoyed by certain academic >types polluting the list with their homegrown "routines". For instance, >what is this 'hahsing' BS?? To code something like that one must've >never worked in the industry. I've been programing for years and never >even heard the term! Anybody ever saw it in any SAS manual? Who'd use it >in the real world? No sane manager will allow this kind of stuff in >production. I'd fire anyone on the team who'd have the audacity to code >a monstrosity like that and claim it works better than merge or sql. >Even if it ran 10% faster, so what? Who'd be able to understand it after >that self appointed guru is let go (or rather fired)? Plus in all my >years in SAS I never seen anything "explicitly coded" work faster than >what is already there in SAS. I'm not so gullible to trust so called >"test results" in those posts, it can be anything, how do we know they >aren't concocted? > >Or what about this 'big format' thing? Who in his right mind would use a >5 million format just to run out of memory, and then the the one who >"coded" it gets called and paid big bucks to fix it. If simple merge was >used nobedy would have no problem in the first place. What I'm saying is >stick to normal standard SAS program. Formats weren't intended for this >kind of (ab)use so it's not standard. > >BTW, I've noticed that those posting those extravagant "methods" are >never on the money trying to answer a normal question about standard SAS >coding. Anybody thinks it's a coincidence? > >Don Pider, >Technical Manager, >14 years of SAS > > > > > > >______________________________________________________ >Get Your Private, Free Email at http://www.hotmail.com

wishing you all the compliments of the season -- Peter Crawford


Back to: Top of message | Previous page | Main SAS-L page