Date: Mon, 15 Jan 2007 09:35:31 -0700
Reply-To: "Barz, Ken" <Ken.Barz@INTRADO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Barz, Ken" <Ken.Barz@INTRADO.COM>
Subject: Re: A new statistical programming language
Content-Type: text/plain; charset="us-ascii"
Alan wrote:
"SAS is used, from my experience, as an ETL language and statistics is
its
secondary use. It is in ETL that SAS rules and why it is dominant."
We have one of our SAS-based ETL processes that typically takes 4+ hours
to complete. As part of a BI evaluation that we're doing, we compared
our existing SAS with MS SQL Server Integration Services using the same
(proc) SQL code and the same odbc driver connection. Performance-wise,
SSIS kicked SAS all over the place.
To be fair though, we haven't yet determined whether SSIS is better or
whether there's something wrong with our SAS.
Ken
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Alan Churchill
Sent: Saturday, January 13, 2007 8:02 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: A new statistical programming language
Robert,
You imply that SAS is only used for statistical programming. Perhaps I
have
it wrong on what you are broadcasting here.
SAS is used, from my experience, as an ETL language and statistics is
its
secondary use. It is in ETL that SAS rules and why it is dominant.
This isn't a "hostile resistance to change" but a question of why we
should.
Cost? Sure, SAS is a freakin' fortune but you get what you pay for. Lack
of
OOP capability? Ok, I would like SAS to allow for more user written
methods.
Other than that, the language works well for me and better than anything
else. I have nitpicks like the inability to have any consistency in the
naming of options (libref, libname, library, lib, etc) but that is all
minor
stuff.
You have to find a reason for people to switch the ETL portion of SAS
and
that is a mighty tall order. They have a 30 year head start and it
simply
works.
Alan
Alan Churchill
Savian "Bridging SAS and Microsoft Technologies"
www.savian.net
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Robert
Sent: Saturday, January 13, 2007 7:32 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: A new statistical programming language
I suppose this was to be expected, many SAS programmers have been using
the same language for 10 years or more, so a hostile resistance to
change was to be expected. To each his own taste.
Let me address Sigurd Hermansen's message first:
The problem is not contrived. Similar problems occur all the time in
pharmaceutical programming. With clinical trials, always count on
Murphy's law( if something can go wrong with the data, it will).
I'm glad you said straightforward that your code example does not do
the same thing. That makes your code-to-code example completely
INVALID, of course. I still believe my six-line example is the most
elegant solution.
You justify it by saying that my data preparation choice is a
"poor imputation method".
I disagree. I've worked as both a statistician and SAS programmer. It's
a reasonable calculation choice for this messy data situation. Of
course there are other options.
In any case, Sigurd, if you are the lead statistician on a given
protocol (clinical trial), well you are entitled to your opinion. If
you are the SAS programmer, just say that you would rather not do
what's being asked of you, because it's a "poor imputation method". You
won't last very long like that , though. SAS programming is a service
profession in the business I come from. You cannot predict how messy
the data will be, nor can you predict the specifications that will be
handed to you. If you are both the statistician and SAS programmer on
the protocol, then you have more leeway, but that's unusual.
Your use of the PROC SQL is most unusual. This is not standard SQL
SELECT. So you are saying that SAS has extended it to beyond the SQL
standard? My syntax in the GRIDFUNC statement is a lot easier to
understand than that, in my opinion. The documention explains what the
GRIDFUNC transform does.
If adding SQL SELECT syntax to later versions of Vilno helps
productivity, it certainly can be done in later versions. The current
version already does cartesian products, and as I said, because of the
internal architecture, it has a lot of room to grow.
On Jan 12, 6:29 pm, HERMA...@WESTAT.COM (Sigurd Hermansen) wrote:
> Robert:
> I'd bet your new statistical programming language shakes things up in
> the statistical programming field the way a gnat shakes up an
> eighteen-wheeler when it smashes into its windshield. I don't wish to
> discourage the idea. Innovation in statistical programming languages
> seems to me a worthy goal. I just don't see anything particularly
> innovative in what you are attempting to do. Besides, the examples of
> how your programming language beats up on SAS don't seem convincing on
> two grounds: 1) the examples you show of SAS programs wouldn't qualify
> as lame Trojan horses. Almost anyone SAS program could write cleaner
and
> more concise SAS Data steps. The problem looks contrived anyway; 2) A
> comparison of your program segment to a comparable SAS SQL query
doesn't
> come off that well. Anyone who would prefer your program to your
example
> of a SAS program could easily prefer a SQL query to your program.
>
> OK, I didn't make the SQL query replicate exactly the results of your
> program, though I could. The way that you are fixing messy data
appears
> to be a poor imputation method.
>
> If you look closely at the questions that statistical programmers are
> asking on the 'L, you may be able to develop a more intuitive and
robust
> language for them to use. I doubt that you will revolutionize the
field
> and earn millions to boot with statements such as 'gridfunc
> baseval=avg(value) by labtest patid'. I suspect that you have a ways
to
> go. Best wishes just the same.
> Siguru
>
> data labdata;
> input patid visit: date: mmddyy10. labtest $1. value;
> if labtest='.' then labtest='';
> cards;
> 1111 1 12/20/2006 x 3
> 2222 1 12/20/2006 y 1
> 2222 2 12/23/2006 y 7
> 2222 3 01/04/2007 y 2
> 3333 1 12/22/2006 a 4
> 3333 2 01/14/2007 z 6
> 4444 -1 . . .
> 5555 1 12/18/2006 x 2
> 5555 2 12/23/2006 a 1
> 6666 -1 12/23/2006 a 2
> 7777 1 12/21/2006 z 7
> ;
> run;
>
> proc sort data=labdata ;
> by labtest patid visit date ;
>
> data base1 ;
> set labdata ;
> where visit=-1 and value^=. ;
>
> data bestdate1 ;
> set base1 (rename=( date=recentdate));
> by labtest patid ;
> if last.patid ;
> keep labtest patid recentdate ;
>
> data base2 ;
> merge base1 bestdate1 ;
> by labtest patid ;
> if date=recentdate ;
>
> proc means data=base2 ;
> by labtest patid ;
> var value ;
> output out=base3 mean=meanbase;
>
> data labdata2 ;
> merge labdata base3 ;
> by labtest patid ;
> change = value - meanbase ;
> run;
>
> proc sql;
> create table labdata22 as
> select t1.*, case when visit=-1 and value^=.
> then (select mean(value)
> from labdata as t2
> where t1.patid=t2.patid
> group by patid,labtest having
> date=max(date)
> )
> else value
> end as value
> from labdata as t1
> ;
> quit;
>
> inlist labdata ;
> addgridvars float: change ;
> gridfunc baseval=avg(value) by labtest patid
> where (visit==-1 and value is not null) and highest date ;
> change = value - baseval ;
> sendoff(labdata2) labtest patid visit date value change baseval ;
>
> -----Original Message-----
> From: owner-sa...@listserv.uga.edu
[mailto:owner-sa...@listserv.uga.edu]
>
> On Behalf Of Robert
> Sent: Thursday, January 11, 2007 8:11 PM
> To: s...@uga.edu
> Subject: A new statistical programming language
>
> Vilno is a new data crunching programming language. It's available as
a
> file attachment at the August 31 blog-cast
atwww.my.opera.com/datahelper.
More information is atwww.xanga.com/datahelperand
datahelper.blogspot.com .
>
> The positive: The syntax of Vilno is a lot more innovative than that
of
> SAS or SPSS, which allows one to achieve more data crunching with less
> code. This productivity gap between the Vilno data processing function
> and the SAS datastep ( or SPSS data crunching ) will only get bigger
> over time because the internal architecture of Vilno gives it a lot of
> room to grow ( for versions 2.0, 3.0, etc.). Also, the source code for
> Vilno is probably tiny compared to the accumulated source at SAS or
> SPSS, which makes Vilno much easier to enhance and extend.
>
> The negative: Not yet ported to Apple/Windows. Still needs a library
of
> mathematical functions and date/time functions(particularly important
> for data crunching). Not yet extended and integrated with a library of
> statistical functions( regression, ANOVA, etc.).
>
> DATA ANALYSIS = DATA CRUNCHING + STATISTICAL ANALYSIS
>
> Data crunching has many names: data cleansing, data preparation, data
> munging. It is the least glamorous of the two halves, but far more
time
> consuming. You cannot do proper data analysis without it.
>
> Statistical analysis is the application of mathematical procedures to
> produce analysis statistics and p-values. The choice and
interpretation
> of these statistical procedures requires some knowledge of applied
> mathematics ( i.e. statistics ). Many people find this activity to be
> far more interesting than data crunching ( I however find data
crunching
> to be a fascinating challenge ).
>
> S-Plus ( or R ) is good at statistical analysis, but not data
crunching.
> Vilno is excellent at data crunching (date/time functions aside), but
> does not yet do statistical analysis.
>
> In data crunching/preparation, there has been a dramatic slowdown in
> productivity growth over the last 20 years. This is because a software
> monopoly causes a lack of competition, hence a slowdown in creativity
> and innovation.
>
> All three major statistical programming languages ( S, SPSS, SAS ) are
> at least three decades old. It's time to shake things up a bit.
|