LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (January 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 14 Jan 2007 23:54:33 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: A new statistical programming language
In-Reply-To:  <001e01c73788$68029c70$3807d550$@net>
Content-Type: text/plain; format=flowed

SASL001@SAVIAN.NET replied: > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Robert >Sent: Saturday, January 13, 2007 7:32 PM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Re: A new statistical programming language > >I suppose this was to be expected, many SAS programmers have been using >the same language for 10 years or more, so a hostile resistance to >change was to be expected. To each his own taste. > >Let me address Sigurd Hermansen's message first: > >The problem is not contrived. Similar problems occur all the time in >pharmaceutical programming. With clinical trials, always count on >Murphy's law( if something can go wrong with the data, it will). > >I'm glad you said straightforward that your code example does not do >the same thing. That makes your code-to-code example completely >INVALID, of course. I still believe my six-line example is the most >elegant solution. > >You justify it by saying that my data preparation choice is a >"poor imputation method". > >I disagree. I've worked as both a statistician and SAS programmer. It's >a reasonable calculation choice for this messy data situation. Of >course there are other options. > >In any case, Sigurd, if you are the lead statistician on a given >protocol (clinical trial), well you are entitled to your opinion. If >you are the SAS programmer, just say that you would rather not do >what's being asked of you, because it's a "poor imputation method". You >won't last very long like that , though. SAS programming is a service >profession in the business I come from. You cannot predict how messy >the data will be, nor can you predict the specifications that will be >handed to you. If you are both the statistician and SAS programmer on >the protocol, then you have more leeway, but that's unusual. > >Your use of the PROC SQL is most unusual. This is not standard SQL >SELECT. So you are saying that SAS has extended it to beyond the SQL >standard? My syntax in the GRIDFUNC statement is a lot easier to >understand than that, in my opinion. The documention explains what the >GRIDFUNC transform does. > >If adding SQL SELECT syntax to later versions of Vilno helps >productivity, it certainly can be done in later versions. The current >version already does cartesian products, and as I said, because of the >internal architecture, it has a lot of room to grow. > > > > >On Jan 12, 6:29 pm, HERMA...@WESTAT.COM (Sigurd Hermansen) wrote: > > Robert: > > I'd bet your new statistical programming language shakes things up in > > the statistical programming field the way a gnat shakes up an > > eighteen-wheeler when it smashes into its windshield. I don't wish to > > discourage the idea. Innovation in statistical programming languages > > seems to me a worthy goal. I just don't see anything particularly > > innovative in what you are attempting to do. Besides, the examples of > > how your programming language beats up on SAS don't seem convincing on > > two grounds: 1) the examples you show of SAS programs wouldn't qualify > > as lame Trojan horses. Almost anyone SAS program could write cleaner and > > more concise SAS Data steps. The problem looks contrived anyway; 2) A > > comparison of your program segment to a comparable SAS SQL query doesn't > > come off that well. Anyone who would prefer your program to your example > > of a SAS program could easily prefer a SQL query to your program. > > > > OK, I didn't make the SQL query replicate exactly the results of your > > program, though I could. The way that you are fixing messy data appears > > to be a poor imputation method. > > > > If you look closely at the questions that statistical programmers are > > asking on the 'L, you may be able to develop a more intuitive and robust > > language for them to use. I doubt that you will revolutionize the field > > and earn millions to boot with statements such as 'gridfunc > > baseval=avg(value) by labtest patid'. I suspect that you have a ways to > > go. Best wishes just the same. > > Siguru > > > > data labdata; > > input patid visit: date: mmddyy10. labtest $1. value; > > if labtest='.' then labtest=''; > > cards; > > 1111 1 12/20/2006 x 3 > > 2222 1 12/20/2006 y 1 > > 2222 2 12/23/2006 y 7 > > 2222 3 01/04/2007 y 2 > > 3333 1 12/22/2006 a 4 > > 3333 2 01/14/2007 z 6 > > 4444 -1 . . . > > 5555 1 12/18/2006 x 2 > > 5555 2 12/23/2006 a 1 > > 6666 -1 12/23/2006 a 2 > > 7777 1 12/21/2006 z 7 > > ; > > run; > > > > proc sort data=labdata ; > > by labtest patid visit date ; > > > > data base1 ; > > set labdata ; > > where visit=-1 and value^=. ; > > > > data bestdate1 ; > > set base1 (rename=( date=recentdate)); > > by labtest patid ; > > if last.patid ; > > keep labtest patid recentdate ; > > > > data base2 ; > > merge base1 bestdate1 ; > > by labtest patid ; > > if date=recentdate ; > > > > proc means data=base2 ; > > by labtest patid ; > > var value ; > > output out=base3 mean=meanbase; > > > > data labdata2 ; > > merge labdata base3 ; > > by labtest patid ; > > change = value - meanbase ; > > run; > > > > proc sql; > > create table labdata22 as > > select t1.*, case when visit=-1 and value^=. > > then (select mean(value) > > from labdata as t2 > > where t1.patid=t2.patid > > group by patid,labtest having > > date=max(date) > > ) > > else value > > end as value > > from labdata as t1 > > ; > > quit; > > > > inlist labdata ; > > addgridvars float: change ; > > gridfunc baseval=avg(value) by labtest patid > > where (visit==-1 and value is not null) and highest date ; > > change = value - baseval ; > > sendoff(labdata2) labtest patid visit date value change baseval ; > > > > -----Original Message----- > > From: owner-sa...@listserv.uga.edu [mailto:owner-sa...@listserv.uga.edu] > > > > On Behalf Of Robert > > Sent: Thursday, January 11, 2007 8:11 PM > > To: s...@uga.edu > > Subject: A new statistical programming language > > > > Vilno is a new data crunching programming language. It's available as a > > file attachment at the August 31 blog-cast >atwww.my.opera.com/datahelper. >More information is atwww.xanga.com/datahelperand datahelper.blogspot.com . > > > > The positive: The syntax of Vilno is a lot more innovative than that of > > SAS or SPSS, which allows one to achieve more data crunching with less > > code. This productivity gap between the Vilno data processing function > > and the SAS datastep ( or SPSS data crunching ) will only get bigger > > over time because the internal architecture of Vilno gives it a lot of > > room to grow ( for versions 2.0, 3.0, etc.). Also, the source code for > > Vilno is probably tiny compared to the accumulated source at SAS or > > SPSS, which makes Vilno much easier to enhance and extend. > > > > The negative: Not yet ported to Apple/Windows. Still needs a library of > > mathematical functions and date/time functions(particularly important > > for data crunching). Not yet extended and integrated with a library of > > statistical functions( regression, ANOVA, etc.). > > > > DATA ANALYSIS = DATA CRUNCHING + STATISTICAL ANALYSIS > > > > Data crunching has many names: data cleansing, data preparation, data > > munging. It is the least glamorous of the two halves, but far more time > > consuming. You cannot do proper data analysis without it. > > > > Statistical analysis is the application of mathematical procedures to > > produce analysis statistics and p-values. The choice and interpretation > > of these statistical procedures requires some knowledge of applied > > mathematics ( i.e. statistics ). Many people find this activity to be > > far more interesting than data crunching ( I however find data crunching > > to be a fascinating challenge ). > > > > S-Plus ( or R ) is good at statistical analysis, but not data crunching. > > Vilno is excellent at data crunching (date/time functions aside), but > > does not yet do statistical analysis. > > > > In data crunching/preparation, there has been a dramatic slowdown in > > productivity growth over the last 20 years. This is because a software > > monopoly causes a lack of competition, hence a slowdown in creativity > > and innovation. > > > > All three major statistical programming languages ( S, SPSS, SAS ) are > > at least three decades old. It's time to shake things up a bit.

> >Robert, > >You imply that SAS is only used for statistical programming. Perhaps I have >it wrong on what you are broadcasting here. > >SAS is used, from my experience, as an ETL language and statistics is its >secondary use. It is in ETL that SAS rules and why it is dominant. > >This isn't a "hostile resistance to change" but a question of why we >should. >Cost? Sure, SAS is a freakin' fortune but you get what you pay for. Lack of >OOP capability? Ok, I would like SAS to allow for more user written >methods. >Other than that, the language works well for me and better than anything >else. I have nitpicks like the inability to have any consistency in the >naming of options (libref, libname, library, lib, etc) but that is all >minor >stuff. > >You have to find a reason for people to switch the ETL portion of SAS and >that is a mighty tall order. They have a 30 year head start and it simply >works. > >Alan > >Alan Churchill >Savian "Bridging SAS and Microsoft Technologies" >www.savian.net >

It seems to me that R and S-Plus, which are weak on the ETL issues, can benefit the most from a tool like Vilno.

As for SAS, I would want to see how Vilno performs on typical ETL processes, for really large data sets. For small data sets, I doubt I would be able wean users off of Excel.

And as for the issue of the example code, I'll just say that I would not let any programmer worknig for me use *that* method for imputation of data, so it's a moot point. We'll just treat it as an example case.

David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Dave vs. Carl: The Insignificant Championship Series. Who will win? http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://davevscarl.spaces.live.com/?icid=T001MSN38C07001


Back to: Top of message | Previous page | Main SAS-L page