Date: Sun, 14 Jan 2007 23:54:33 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: A new statistical programming language
Content-Type: text/plain; format=flowed
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Robert
>Sent: Saturday, January 13, 2007 7:32 PM
>Subject: Re: A new statistical programming language
>I suppose this was to be expected, many SAS programmers have been using
>the same language for 10 years or more, so a hostile resistance to
>change was to be expected. To each his own taste.
>Let me address Sigurd Hermansen's message first:
>The problem is not contrived. Similar problems occur all the time in
>pharmaceutical programming. With clinical trials, always count on
>Murphy's law( if something can go wrong with the data, it will).
>I'm glad you said straightforward that your code example does not do
>the same thing. That makes your code-to-code example completely
>INVALID, of course. I still believe my six-line example is the most
>You justify it by saying that my data preparation choice is a
>"poor imputation method".
>I disagree. I've worked as both a statistician and SAS programmer. It's
>a reasonable calculation choice for this messy data situation. Of
>course there are other options.
>In any case, Sigurd, if you are the lead statistician on a given
>protocol (clinical trial), well you are entitled to your opinion. If
>you are the SAS programmer, just say that you would rather not do
>what's being asked of you, because it's a "poor imputation method". You
>won't last very long like that , though. SAS programming is a service
>profession in the business I come from. You cannot predict how messy
>the data will be, nor can you predict the specifications that will be
>handed to you. If you are both the statistician and SAS programmer on
>the protocol, then you have more leeway, but that's unusual.
>Your use of the PROC SQL is most unusual. This is not standard SQL
>SELECT. So you are saying that SAS has extended it to beyond the SQL
>standard? My syntax in the GRIDFUNC statement is a lot easier to
>understand than that, in my opinion. The documention explains what the
>GRIDFUNC transform does.
>If adding SQL SELECT syntax to later versions of Vilno helps
>productivity, it certainly can be done in later versions. The current
>version already does cartesian products, and as I said, because of the
>internal architecture, it has a lot of room to grow.
>On Jan 12, 6:29 pm, HERMA...@WESTAT.COM (Sigurd Hermansen) wrote:
> > Robert:
> > I'd bet your new statistical programming language shakes things up in
> > the statistical programming field the way a gnat shakes up an
> > eighteen-wheeler when it smashes into its windshield. I don't wish to
> > discourage the idea. Innovation in statistical programming languages
> > seems to me a worthy goal. I just don't see anything particularly
> > innovative in what you are attempting to do. Besides, the examples of
> > how your programming language beats up on SAS don't seem convincing on
> > two grounds: 1) the examples you show of SAS programs wouldn't qualify
> > as lame Trojan horses. Almost anyone SAS program could write cleaner and
> > more concise SAS Data steps. The problem looks contrived anyway; 2) A
> > comparison of your program segment to a comparable SAS SQL query doesn't
> > come off that well. Anyone who would prefer your program to your example
> > of a SAS program could easily prefer a SQL query to your program.
> > OK, I didn't make the SQL query replicate exactly the results of your
> > program, though I could. The way that you are fixing messy data appears
> > to be a poor imputation method.
> > If you look closely at the questions that statistical programmers are
> > asking on the 'L, you may be able to develop a more intuitive and robust
> > language for them to use. I doubt that you will revolutionize the field
> > and earn millions to boot with statements such as 'gridfunc
> > baseval=avg(value) by labtest patid'. I suspect that you have a ways to
> > go. Best wishes just the same.
> > Siguru
> > data labdata;
> > input patid visit: date: mmddyy10. labtest $1. value;
> > if labtest='.' then labtest='';
> > cards;
> > 1111 1 12/20/2006 x 3
> > 2222 1 12/20/2006 y 1
> > 2222 2 12/23/2006 y 7
> > 2222 3 01/04/2007 y 2
> > 3333 1 12/22/2006 a 4
> > 3333 2 01/14/2007 z 6
> > 4444 -1 . . .
> > 5555 1 12/18/2006 x 2
> > 5555 2 12/23/2006 a 1
> > 6666 -1 12/23/2006 a 2
> > 7777 1 12/21/2006 z 7
> > ;
> > run;
> > proc sort data=labdata ;
> > by labtest patid visit date ;
> > data base1 ;
> > set labdata ;
> > where visit=-1 and value^=. ;
> > data bestdate1 ;
> > set base1 (rename=( date=recentdate));
> > by labtest patid ;
> > if last.patid ;
> > keep labtest patid recentdate ;
> > data base2 ;
> > merge base1 bestdate1 ;
> > by labtest patid ;
> > if date=recentdate ;
> > proc means data=base2 ;
> > by labtest patid ;
> > var value ;
> > output out=base3 mean=meanbase;
> > data labdata2 ;
> > merge labdata base3 ;
> > by labtest patid ;
> > change = value - meanbase ;
> > run;
> > proc sql;
> > create table labdata22 as
> > select t1.*, case when visit=-1 and value^=.
> > then (select mean(value)
> > from labdata as t2
> > where t1.patid=t2.patid
> > group by patid,labtest having
> > date=max(date)
> > )
> > else value
> > end as value
> > from labdata as t1
> > ;
> > quit;
> > inlist labdata ;
> > addgridvars float: change ;
> > gridfunc baseval=avg(value) by labtest patid
> > where (visit==-1 and value is not null) and highest date ;
> > change = value - baseval ;
> > sendoff(labdata2) labtest patid visit date value change baseval ;
> > -----Original Message-----
> > From: owner-sa...@listserv.uga.edu [mailto:owner-sa...@listserv.uga.edu]
> > On Behalf Of Robert
> > Sent: Thursday, January 11, 2007 8:11 PM
> > To: s...@uga.edu
> > Subject: A new statistical programming language
> > Vilno is a new data crunching programming language. It's available as a
> > file attachment at the August 31 blog-cast
>More information is atwww.xanga.com/datahelperand datahelper.blogspot.com .
> > The positive: The syntax of Vilno is a lot more innovative than that of
> > SAS or SPSS, which allows one to achieve more data crunching with less
> > code. This productivity gap between the Vilno data processing function
> > and the SAS datastep ( or SPSS data crunching ) will only get bigger
> > over time because the internal architecture of Vilno gives it a lot of
> > room to grow ( for versions 2.0, 3.0, etc.). Also, the source code for
> > Vilno is probably tiny compared to the accumulated source at SAS or
> > SPSS, which makes Vilno much easier to enhance and extend.
> > The negative: Not yet ported to Apple/Windows. Still needs a library of
> > mathematical functions and date/time functions(particularly important
> > for data crunching). Not yet extended and integrated with a library of
> > statistical functions( regression, ANOVA, etc.).
> > DATA ANALYSIS = DATA CRUNCHING + STATISTICAL ANALYSIS
> > Data crunching has many names: data cleansing, data preparation, data
> > munging. It is the least glamorous of the two halves, but far more time
> > consuming. You cannot do proper data analysis without it.
> > Statistical analysis is the application of mathematical procedures to
> > produce analysis statistics and p-values. The choice and interpretation
> > of these statistical procedures requires some knowledge of applied
> > mathematics ( i.e. statistics ). Many people find this activity to be
> > far more interesting than data crunching ( I however find data crunching
> > to be a fascinating challenge ).
> > S-Plus ( or R ) is good at statistical analysis, but not data crunching.
> > Vilno is excellent at data crunching (date/time functions aside), but
> > does not yet do statistical analysis.
> > In data crunching/preparation, there has been a dramatic slowdown in
> > productivity growth over the last 20 years. This is because a software
> > monopoly causes a lack of competition, hence a slowdown in creativity
> > and innovation.
> > All three major statistical programming languages ( S, SPSS, SAS ) are
> > at least three decades old. It's time to shake things up a bit.
>You imply that SAS is only used for statistical programming. Perhaps I have
>it wrong on what you are broadcasting here.
>SAS is used, from my experience, as an ETL language and statistics is its
>secondary use. It is in ETL that SAS rules and why it is dominant.
>This isn't a "hostile resistance to change" but a question of why we
>Cost? Sure, SAS is a freakin' fortune but you get what you pay for. Lack of
>OOP capability? Ok, I would like SAS to allow for more user written
>Other than that, the language works well for me and better than anything
>else. I have nitpicks like the inability to have any consistency in the
>naming of options (libref, libname, library, lib, etc) but that is all
>You have to find a reason for people to switch the ETL portion of SAS and
>that is a mighty tall order. They have a 30 year head start and it simply
>Savian "Bridging SAS and Microsoft Technologies"
It seems to me that R and S-Plus, which are weak on the ETL issues,
can benefit the most from a tool like Vilno.
As for SAS, I would want to see how Vilno performs on typical ETL
processes, for really large data sets. For small data sets, I doubt I
would be able wean users off of Excel.
And as for the issue of the example code, I'll just say that I would
not let any programmer worknig for me use *that* method for
imputation of data, so it's a moot point. We'll just treat it as
an example case.
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
Dave vs. Carl: The Insignificant Championship Series. Who will win?