Date: Sun, 21 Jan 2007 22:05:00 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: A new statistical programming language
In-Reply-To: <1169166575.140852.101780@l53g2000cwa.googlegroups.com>
Content-Type: text/plain; format=flowed
callingrw@YAHOO.COM replied:
>
>David L Cassell wrote:
> > SASL001@SAVIAN.NET replied:
> > >
> > >From what I have seen, a lot of these tools use VB for their base
>language.
> > >Others can correct me here since I haven't used all of them.
> > >
> > >Nothing, IMO, compares to SAS. VB is powerful but in a different way. I
>am
> > >a
> > >C# guy and it is very powerful but harder for non-programmers to use.
>It is
> > >also not built for ETL but is more general purpose. With the advent of
> > >LINQ,
> > >it will become a very powerful ETL engine.
> > >
> > >I use DATA step for ETL. It smokes SQL for most complex tasks and I
>have
> > >yet
> > >to meet its equal.
> > >
> > >ETL is a little e and a little l. The T is what matters. You always
>have
> > >extraction (how else can you get data), a lot of transformation
>(otherwise
> > >just copy things), and load. ETL is what we all do even if we call it
> > >something else.
> > >
> > >Finally, SAS is a workhorse at ETL and is the best. However, its GUI
>power
> > >is not there and it was never built for GUIs. I use C# for the UI and
>SAS
> > >as
> > >the data engine. In combination, they work great. I don't see them as
> > >competitors but as complimentary tools.
> > >
> > >Alan
> > >
> > >Alan Churchill
> > >Savian "Bridging SAS and Microsoft Technologies"
> > >www.savian.net
> > >
> >
> > If I may disagree on one point, I would say that the DATA step is
> > phenomenally powerful, but not more powerful than PROC SQL.
> > If I write code in one which runs much faster than my code in the
> > other, the problem is usually in my own code, not in the tool.
> >
> > There are exceptions to this, but there are not a lot of them.
> >
> > Or if you go back far enough (to the days before the DATA step
> > people and the PROC SQL people were housed together) then
> > you can probably find cases in *old* SAS versions where there
> > were noted differences.
> >
> > Oh, and if you disagree with me, I'm going to sic Howard and Sig
> > on you. :-) :-)
> >
> > Howard's book on SQL is going to be out when????
> >
> > David
> > --
> > David L. Cassell
> > mathematical statistician
> > Design Pathways
> > 3115 NW Norwood Pl.
> > Corvallis OR 97330
>
>David,
>
>I'm going to have to disagree with you on a couple of points here.
Feel free! I'm a disagreeable guy. :-)
>First , you're only mentioning CPU performance when comparing the SAS
>datastep to the SQL language. There's more to it than that. There's
>also how easy it is for the programmer to state what type of
>transformation he wishes to execute. The syntax impacts the
>expressibility of the language. SQL was deliberately restrictive, SAS
>and SPSS and not too many others chose to reject this consensus 30
>years ago - and they were correct. This is why statistical departments
>have avoided the SQL language in favor of the SAS datastep. There's too
>much obvious stuff you would like to do to prepare the data, but is
>difficult or un-doable with SQL syntax.
Yes. This is what I call 'programmer efficiency', a term I lifted from
Larry Wall.
Now, not everything is a SQL problem. But if the representation of
the data transform is set-theoretic, then SQL is a natural for the process.
If the representation of the data transform is sequential instead, then
I usually recommend *against* SQL.
So I don't think we're disagreeing on this point. PROC SQL has
immense power, and it has a lot of DATA step features which are
not part of other SQL languages. This gives it some extra capabilities.
But it is not a panacea, just as the DATA step is not. Nothing is
"the best thing to use in all cases".
>Now if PROC SQL implements syntax features that extend and go beyond
>standard SQL, please let me know. An earlier reply to this post seems
>to indicate that (6th down, from Sigurd, when he called me a mosquito).
Well, the SQL standard is IMHO an international standard that no one
fully adheres to. :-) PROC SQL allows for a variety of DATA step functions
which are not in the official list of SQL functions. And it provides an
(undocumented so use at your own risk) option for dealing with sequential
data as well: the MONOTONIC() function. This is not unusual: most SQL
purveyors have found ways of shoving this type of tool into their variant
of SQL.
And I'm pretty sure Sig did not call *you* a mosquito.
>There has been a consensus ( formal or informal) that the SQL
>programming language should be restrictive, and even so, other data
>manipulation programming languages should be avoided - just do what you
>can with SQL. This groupthink, now 20-30 years old , is simply wrong.
>SAS and SPSS were right to go their own ways. You can't do all the
>needed data manipulation with the SQL programming language, because one
>needs more flexibility for unexpected data situations. I'm going to do
>a post on my blog on the problems caused by this SQL-groupthink. (I'm
>been thinking of doing it for months, and I might have already hinted
>at it on my www.xanga.com/datahelper blog , I don't remember for sure
>).
No, you cannot do everything in PROC SQL, much less SQL. But
I did *not* say that I thought anyone should. I merely said that
the assumed differences in under-the-hood code tweaks between
the DATA step and PROC SQL have become vanishingly small, due
to sharing between the different SAS workgroups. When people
tell me "the DATA step is always faster" or "PROC SQL is always faster"
I usually find that they can code well in one but not the other.
>****************************************
>
>
>A less important point, regarding an earlier reply you posted:
>There are a number of criticisms of of my data "imputation"
>Your's is the most serious, since you are a statistician.
Well, I'm the loudest, but that doesn't make my criticism the
most serious.
>Well, you're wrong. A patient has two baseline visits, May 14 and May
>20. (treatment starts at say May 22). To calculate change from
>baseline, first you need to choose a baseline. Select the average of
>the values recorded on May 20. There's nothing wrong with that. I was a
>statistician for several years working with clinical trials.
>
>****************************************
But that's not standard 'imputation' for general data problems.
You didn't explain any of that beforehand.
And I still don't like the use of a single mean for imputation. Just
ask Rod Little what he thinks of it.
>I do agree with you about what you said about Vilno and S-Plus/R ,
>though.
>
>
>Robert
Okay, so we're probably 2-out-of-3 here. I'm okay with that. :-)
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Valentine’s Day -- Shop for gifts that spell L-O-V-E at MSN Shopping
http://shopping.msn.com/content/shp/?ctId=8323,ptnrid=37,ptnrdata=24095&tcode=wlmtagline