Date: Wed, 7 Jan 2009 12:14:43 -0500
Reply-To: "Hixon, John" <jhixon@AMGEN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Hixon, John" <jhixon@AMGEN.COM>
Subject: Re: R "Threatens" SAS, According to The New York Times
Content-Type: text/plain; charset="us-ascii"
Hello Mike and all SAS-L'rs.
I saw the article about R in this morning's NYTimes.
I found it quite a coincidence that one of the people quoted was Max Kuhn at Pfizer,
since yesterday evening I loaded his R-package named 'caret' onto my "school" laptop.
I've enrolled (part-time) in an Interdisciplinary PhD program in Applied Mathematical Sciences
with a concentration in Biophysics/Bioinformatics at URI. For my first class, I signed up for
Bioinformatics I, but I withdrew after one class (and took a different required course).
Why did I withdraw? After scanning the Bioinformatics textbook, and doing some web research,
it became clear that I would need to learn R (at a minimum) and perhaps also Python, in
order to take advantage of the current tools available for Bioinformatics.
It is very clear that the latest statistical methods will always appear first as
R packages, since there is a multitude of PhD statisticians using and extending the software.
I am a member of ASA. I subscribe to the Journal of Computational and Graphical
Statistics. At the end of each issue there is a section titled "Recent Publications
in JSS" [Journal of Statistical Software]. 98% of the references are to new packages
written for R.
As part of a project in my most recent URI class, I needed to use Kernel-based Support Vector
Machines (KSVM) as a classification method. I e-mailed a few contacts I have at SAS to
inquire whether there were any IML subroutines that implemented SVM methods in SAS. The answer
was: if there are any SAS methods for SVMs they would probably be in Enterprise Miner (which
of course no one can afford).
But, kernel SVMs are readily available in R (for example see the R package 'kernlab').
I am in the very early days of gaining some R expertise, but, I am excited by what
I find. It seems to be 'the place to be' if you want to be current with the
latest statistical methods.
That being said, I work for Amgen in a heavily regulated industry (FDA, EMEA). For Amgen work,
I only use SAS, since I wonder about whether we can consider R package "XYZ" from
Professor "ABC", to be 'validated', (as we can with the PROCS in SAS).
Still, for my schoolwork, I intend to do all required analyses "both ways": using both SAS and R,
so that I can become more proficient with R. Of course, that only applies to the analyses
that are *possible* in SAS (using SAS/STAT SAS/IML, SAS/OR etc. which are licensed by URI).
I have already run into the situation where I can do analyses in R that are not possible
for me in SAS.
If I worked at SAS, that would frighten me.
To get some notion of the depth of packages available in R, have a look at this recent JSS
paper written by Max Kuhn. You can view it here:
Notice in particular, Table 1 on Page 9. This shows only a subset of the model-building
methods available in R. Only a subset of these methods are available in SAS STAT/IML etc. I assume
that others are available in SAS Enterpise Miner, but I have never been exposed to that
SAS module since neither of my previous employers could afford the immense cost.
So..my message to young statisticians is : learn *both* SAS and R.
I have always *loved* writing SAS code. I enjoy the creativity of using SAS to develop applications
that automate the analysis and graphical presentation of complex data streams. I will always love SAS.
But...a young lovely named R has caught my attention of late, and, she's quite seductive with
her manifold charms, (and she seems to grow even more seductive as she matures...)
For example: try googling: rggobi or bioconductor....
>Date: Wed, 7 Jan 2009 08:53:55 -0500
>From: Mike Zdeb <msz03@ALBANY.EDU>
>Subject: Re: R "Threatens" SAS, According to The New York Times
>hi ... neat read in that we have a bit of the same debate here (though it's heavily SAS-weighted)
>noticed the quote from the person at Pfizer re R in the Times article
>also noticed the job posting for Pfizer's "Associate Director/ Director, Quantitative Epidemiologist" at
>that includes no mention of proficiency with R, but does include ...
><yadda yadda yadda>
>"Hands on involvement in epidemiological projects including use of epidemiology databases, SAS programming, quality control checks for programs, and documentation"
>"Knowledge of SQL and automated programming including SAS macros and interface development"
><blah blah blah>
>"Proficiency in SAS programming, including statistical analysis procedures, and experience with other statistical software"
>I guess that R would fall into that large "OTHER STATISTICAL SOFTWARE" vat
>U@Albany School of Public Health
>One University Place
>Rensselaer, New York 12144-3456
> On Tue, 6 Jan 2009 20:41:40 -0800, Virtual SUG <sfbay0001@AOL.COM> wrote:
>>Thought you might be interested in reading this article, which appears
>>in the 1/6/9 online edition of The New York Times:
>>The headline is "Data Analysts Captivated by R's Power," and towards
>>the end of the story is the following paragraph:
>>"While it is difficult to calculate exactly how many people use R,
>>those most familiar with the software estimate that close to 250,000
>>people work with it regularly. The popularity of R at universities
>>could threaten SAS Institute, the privately held business software
>>company that specializes in data analysis software. SAS, with more
>>than $2 billion in annual revenue, has been the preferred tool of
>>scholars and corporate managers. "
>>Sierra Information Services
> Even David and Toby were implicitly mentioned in the article:
> "R has really become the second language for people coming out of grad
> school now, and there's an amazing amount of code being written for it,"
> said Max Kuhn, associate director of nonclinical statistics at Pfizer.
> "You can look on the SAS message boards and see there is a
> proportional downturn in traffic."
> Ken Borowiak