Date: Tue, 29 Apr 2008 15:23:44 -0700
Reply-To: Jack Hamilton <jfh@STANFORDALUMNI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jack Hamilton <jfh@STANFORDALUMNI.ORG>
Subject: Re: Is there an efficient way to use OUTPUT statement
In-Reply-To: <BAY123-W301A3D7DC8EBA0ED2C6D17DED90@phx.gbl>
Content-Type: text/plain; charset=windows-1252; format=flowed
The people at SAS Institute I have worked with have all been competent
and dedicated, and they want to do the right thing, but they don't
always seem to have the right background. The developers, for the most
part, have not used base SAS heavily for anything other than testing,
and some of the training staff still seems to be coding from a SAS82 manual.
It is important to have computer scientists in the development process,
because they will help prevent important theoretical errors, but it's
also helpful to have a real-life process (or several processes) in mind,
to keep new additions usable.
SAS's thoroughly broken development and distribution is also a problem.
If someone in base development had come up with a good idea right
after 9.1 was released, we wouldn't see it until 2008 or later, because
of the feature freeze before 9.2, and SAS's inability to effectively
distribute test releases. The iterations take much too long.
The development of the regular expression functions is a good case
study. They started off with SAS regular expressions, then (and I don't
remember the exact order in which these occurred) they added Perl
regular expressions, the ability to use a string regular expression as a
parameter, various match and change functions, and they dropped the
requirement that a regex start with a special character. The whole
process took several years. I like the results, but wish they had
happened faster.
There are some internal policies that slow down development, but I
wonder if the base SAS development team is adequately staffed.
toby dunn wrote:
> Id suspect they need to fire them. I dont give a tinkers damn how good of a programmer one is nor how great their programs are, if you are building it for a someone else to use and it isnt very easy to interact with, no one will use it. There are plenty of computer science books and I know some CS classes even cover the idea of user interfaces and usability.
>
> Honestly id suspect that it was short sightedness. I dont think SI has given the mighty Hash enough thought yet. Consequently, they havent fully understood everything one can or would want to do with it, nor how to make the sucker easier for the SAS user to use.
>
>
> Toby Dunn
>
> "Don't bail. The best gold is at the bottom of barrels of crap."
> Randy Pausch
>
> "Be prepared. Luck is where preparation meets opportunity."
> Randy Pausch
>
>
>> Date: Tue, 29 Apr 2008 09:21:12 -0700
>> From: jfh@STANFORDALUMNI.ORG
>> Subject: Re: Is there an efficient way to use OUTPUT statement
>> To: SAS-L@LISTSERV.UGA.EDU
>>
>> I suspect that the answer is a practical one: SAS Institute hires
>> computer science and statistics majors, not business analysts.
>>
>>
>> toby dunn wrote:
>>> Jack ,
>>>
>>> Perhaps you could answer this question for me.
>>>
>>> One of the problems I have seen in all the Hash solutions which creates many data sets from a by variables values is that the examples always use very few variables in the DefineData method. I dont know about you but I rarely have that few variables in a data set. Now one could try and use the DefineData( All: 'Yes' ), but means you would need to also specify a data set in the hash declaration and once you do that the variables you use in the DefineKey method have to already resid in the data set. Which means in your example the _Unique_Id cant be created it already has to live in the data set.
>>>
>>> Having said that suppose you wanted to break up said data set into a bunch of little data sets and one didnt have a unique id var already in your data set and further more you didnt want to manually specify all your variable names in the DefineData method. The only 2 solutions I can come up with are:
>>> 1.) Define a view with 0 obs and add a unique var.
>>> Which would look something like Changs Solution
>>> 2.) Write code to extract the variable names like Pauls solution.
>>>
>>>
>>> For the life of me I dont know why SI couldnt make the damned thing user friendly and just let someone do:
>>>
>>> DCL Hash ABC( DataSet: 'XYZ' , Order: 'A' ) ;
>>> ABC.DefineKey( '_N_' ) ;
>>> ABC.DefineData( All: 'Yes' ) ;
>>> ABC.DefineDone() ;
>>>
>>> Where _N_ isnt already in Data Set ABC.
>>>
>>>
>>>
>>>
>>> Toby Dunn
>>>
>>> "Don't bail. The best gold is at the bottom of barrels of crap."
>>> Randy Pausch
>>>
>>> "Be prepared. Luck is where preparation meets opportunity."
>>> Randy Pausch
>>>
>>>
>>>> Date: Thu, 24 Apr 2008 14:59:46 -0700
>>>> From: jfh@STANFORDALUMNI.ORG
>>>> Subject: Re: Is there an efficient way to use OUTPUT statement
>>>> To: SAS-L@LISTSERV.UGA.EDU
>>>>
>>>> You might also want to look at my SESUG paper from last year, "Creating
>>>> Data-Driven Data Set Names in a Single Pass Using Hash Objects",
>>>> .
>>>>
>>>>
>>>> On Thu, 24 Apr 2008 16:50:52 -0500, "data _null_,"
>>>> said:
>>>>> On Thu, Apr 24, 2008 at 1:10 PM, Chang Chung
>>>>> wrote:
>>>>>> Wow! Thank *you* for the nice words. But if i know anything about hashing,
>>>>>> then it is entirely thanks to Paul who introduced hashing to sas community
>>>>>> and convinced si to implement and to improve hash object. See his classic paper:
>>>>> Yes, I have read at and around, most of Mr. D's papers regarding
>>>>> Hashing and any other subject he has chosen to write about. And I
>>>>> probably read and "studied" an example very much like you posted. I
>>>>> think it was the juxtaposition of the other offerings and yours using
>>>>> _NEW_ to create a new hash for each new BY group that caused the light
>>>>> to switch on with regard to the problem I was working last year.
>>>>>
>>>>> I needed to process some values BY subject into a HASH and "pick the
>>>>> winner" then do it again for the next subject. I did not use _NEW_
>>>>> but instead DECLARED the hash for each subject. Very slow as you well
>>>>> know.
>>>>>
>>>>> Sometimes you just have to see things in the "right light".
>>>>>
>>>>> Thanks again.
>>>>>
>>>>>> Paul M. Dorfman (2001) "Table Look-Up by Direct Addressing: Key-Indexing --
>>>>>> Bitmapping -- Hashing" at http://www2.sas.com/proceedings/sugi26/p008-26.pdf
>>>>>>
>>>>>> Or many other important papers by Paul:
>>>>>> http://www.google.com/search?q=Paul+Dorfman+site%3Awww2.sas.com+filetype%3Apdf
>>>>>>
>>>>>> Cheers,
>>>>>> Chang
>>>>>>
>>>> --
>>>> Jack Hamilton
>>>> Sacramento, California
>>>> jfh@alumni.stanford.org
>>> _________________________________________________________________
>>> In a rush? Get real-time answers with Windows Live Messenger.
>>> http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_realtime_042008
>
> _________________________________________________________________
> Back to work after baby–how do you know when you’re ready?
> http://lifestyle.msn.com/familyandparenting/articleNW.aspx?cp-documentid=5797498&ocid=T067MSN40A0701A
|