LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (January 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 15 Jan 2007 09:35:31 -0700
Reply-To:     "Barz, Ken" <Ken.Barz@INTRADO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Barz, Ken" <Ken.Barz@INTRADO.COM>
Subject:      Re: A new statistical programming language
Content-Type: text/plain; charset="us-ascii"

Alan wrote:

"SAS is used, from my experience, as an ETL language and statistics is its secondary use. It is in ETL that SAS rules and why it is dominant."

We have one of our SAS-based ETL processes that typically takes 4+ hours to complete. As part of a BI evaluation that we're doing, we compared our existing SAS with MS SQL Server Integration Services using the same (proc) SQL code and the same odbc driver connection. Performance-wise, SSIS kicked SAS all over the place.

To be fair though, we haven't yet determined whether SSIS is better or whether there's something wrong with our SAS.

Ken

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Alan Churchill Sent: Saturday, January 13, 2007 8:02 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: A new statistical programming language

Robert,

You imply that SAS is only used for statistical programming. Perhaps I have it wrong on what you are broadcasting here.

SAS is used, from my experience, as an ETL language and statistics is its secondary use. It is in ETL that SAS rules and why it is dominant.

This isn't a "hostile resistance to change" but a question of why we should. Cost? Sure, SAS is a freakin' fortune but you get what you pay for. Lack of OOP capability? Ok, I would like SAS to allow for more user written methods. Other than that, the language works well for me and better than anything else. I have nitpicks like the inability to have any consistency in the naming of options (libref, libname, library, lib, etc) but that is all minor stuff.

You have to find a reason for people to switch the ETL portion of SAS and that is a mighty tall order. They have a 30 year head start and it simply works.

Alan

Alan Churchill Savian "Bridging SAS and Microsoft Technologies" www.savian.net

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Robert Sent: Saturday, January 13, 2007 7:32 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: A new statistical programming language

I suppose this was to be expected, many SAS programmers have been using the same language for 10 years or more, so a hostile resistance to change was to be expected. To each his own taste.

Let me address Sigurd Hermansen's message first:

The problem is not contrived. Similar problems occur all the time in pharmaceutical programming. With clinical trials, always count on Murphy's law( if something can go wrong with the data, it will).

I'm glad you said straightforward that your code example does not do the same thing. That makes your code-to-code example completely INVALID, of course. I still believe my six-line example is the most elegant solution.

You justify it by saying that my data preparation choice is a "poor imputation method".

I disagree. I've worked as both a statistician and SAS programmer. It's a reasonable calculation choice for this messy data situation. Of course there are other options.

In any case, Sigurd, if you are the lead statistician on a given protocol (clinical trial), well you are entitled to your opinion. If you are the SAS programmer, just say that you would rather not do what's being asked of you, because it's a "poor imputation method". You won't last very long like that , though. SAS programming is a service profession in the business I come from. You cannot predict how messy the data will be, nor can you predict the specifications that will be handed to you. If you are both the statistician and SAS programmer on the protocol, then you have more leeway, but that's unusual.

Your use of the PROC SQL is most unusual. This is not standard SQL SELECT. So you are saying that SAS has extended it to beyond the SQL standard? My syntax in the GRIDFUNC statement is a lot easier to understand than that, in my opinion. The documention explains what the GRIDFUNC transform does.

If adding SQL SELECT syntax to later versions of Vilno helps productivity, it certainly can be done in later versions. The current version already does cartesian products, and as I said, because of the internal architecture, it has a lot of room to grow.

On Jan 12, 6:29 pm, HERMA...@WESTAT.COM (Sigurd Hermansen) wrote: > Robert: > I'd bet your new statistical programming language shakes things up in > the statistical programming field the way a gnat shakes up an > eighteen-wheeler when it smashes into its windshield. I don't wish to > discourage the idea. Innovation in statistical programming languages > seems to me a worthy goal. I just don't see anything particularly > innovative in what you are attempting to do. Besides, the examples of > how your programming language beats up on SAS don't seem convincing on > two grounds: 1) the examples you show of SAS programs wouldn't qualify > as lame Trojan horses. Almost anyone SAS program could write cleaner and > more concise SAS Data steps. The problem looks contrived anyway; 2) A > comparison of your program segment to a comparable SAS SQL query doesn't > come off that well. Anyone who would prefer your program to your example > of a SAS program could easily prefer a SQL query to your program. > > OK, I didn't make the SQL query replicate exactly the results of your > program, though I could. The way that you are fixing messy data appears > to be a poor imputation method. > > If you look closely at the questions that statistical programmers are > asking on the 'L, you may be able to develop a more intuitive and robust > language for them to use. I doubt that you will revolutionize the field > and earn millions to boot with statements such as 'gridfunc > baseval=avg(value) by labtest patid'. I suspect that you have a ways to > go. Best wishes just the same. > Siguru > > data labdata; > input patid visit: date: mmddyy10. labtest $1. value; > if labtest='.' then labtest=''; > cards; > 1111 1 12/20/2006 x 3 > 2222 1 12/20/2006 y 1 > 2222 2 12/23/2006 y 7 > 2222 3 01/04/2007 y 2 > 3333 1 12/22/2006 a 4 > 3333 2 01/14/2007 z 6 > 4444 -1 . . . > 5555 1 12/18/2006 x 2 > 5555 2 12/23/2006 a 1 > 6666 -1 12/23/2006 a 2 > 7777 1 12/21/2006 z 7 > ; > run; > > proc sort data=labdata ; > by labtest patid visit date ; > > data base1 ; > set labdata ; > where visit=-1 and value^=. ; > > data bestdate1 ; > set base1 (rename=( date=recentdate)); > by labtest patid ; > if last.patid ; > keep labtest patid recentdate ; > > data base2 ; > merge base1 bestdate1 ; > by labtest patid ; > if date=recentdate ; > > proc means data=base2 ; > by labtest patid ; > var value ; > output out=base3 mean=meanbase; > > data labdata2 ; > merge labdata base3 ; > by labtest patid ; > change = value - meanbase ; > run; > > proc sql; > create table labdata22 as > select t1.*, case when visit=-1 and value^=. > then (select mean(value) > from labdata as t2 > where t1.patid=t2.patid > group by patid,labtest having > date=max(date) > ) > else value > end as value > from labdata as t1 > ; > quit; > > inlist labdata ; > addgridvars float: change ; > gridfunc baseval=avg(value) by labtest patid > where (visit==-1 and value is not null) and highest date ; > change = value - baseval ; > sendoff(labdata2) labtest patid visit date value change baseval ; > > -----Original Message----- > From: owner-sa...@listserv.uga.edu [mailto:owner-sa...@listserv.uga.edu] > > On Behalf Of Robert > Sent: Thursday, January 11, 2007 8:11 PM > To: s...@uga.edu > Subject: A new statistical programming language > > Vilno is a new data crunching programming language. It's available as a > file attachment at the August 31 blog-cast atwww.my.opera.com/datahelper. More information is atwww.xanga.com/datahelperand datahelper.blogspot.com . > > The positive: The syntax of Vilno is a lot more innovative than that of > SAS or SPSS, which allows one to achieve more data crunching with less > code. This productivity gap between the Vilno data processing function > and the SAS datastep ( or SPSS data crunching ) will only get bigger > over time because the internal architecture of Vilno gives it a lot of > room to grow ( for versions 2.0, 3.0, etc.). Also, the source code for > Vilno is probably tiny compared to the accumulated source at SAS or > SPSS, which makes Vilno much easier to enhance and extend. > > The negative: Not yet ported to Apple/Windows. Still needs a library of > mathematical functions and date/time functions(particularly important > for data crunching). Not yet extended and integrated with a library of > statistical functions( regression, ANOVA, etc.). > > DATA ANALYSIS = DATA CRUNCHING + STATISTICAL ANALYSIS > > Data crunching has many names: data cleansing, data preparation, data > munging. It is the least glamorous of the two halves, but far more time > consuming. You cannot do proper data analysis without it. > > Statistical analysis is the application of mathematical procedures to > produce analysis statistics and p-values. The choice and interpretation > of these statistical procedures requires some knowledge of applied > mathematics ( i.e. statistics ). Many people find this activity to be > far more interesting than data crunching ( I however find data crunching > to be a fascinating challenge ). > > S-Plus ( or R ) is good at statistical analysis, but not data crunching. > Vilno is excellent at data crunching (date/time functions aside), but > does not yet do statistical analysis. > > In data crunching/preparation, there has been a dramatic slowdown in > productivity growth over the last 20 years. This is because a software > monopoly causes a lack of competition, hence a slowdown in creativity > and innovation. > > All three major statistical programming languages ( S, SPSS, SAS ) are > at least three decades old. It's time to shake things up a bit.


Back to: Top of message | Previous page | Main SAS-L page