```Date: Sun, 8 May 2011 14:27:51 -0400 Reply-To: Dave Fournier Sender: "SAS(r) Discussion" From: Dave Fournier Subject: Re: Why still use SAS with a lot of open source applications? Comments: To: Vincent Granville On Sat, 7 May 2011 13:59:17 -0400, Vincent Granville wrote: >This discussion was posted on our LinkedIn group. Here's my answer: > >SAS has some nice features, such as the SQL procedure or simple "group by" >features. Try to compute correlations "by group" in R: say you have 2,000 >groups, 2 variables e.g. salary and education level, and 2 million >observations - you want to compute correlation between salary and education >within each group. > >It is not obvious, your best bet is to use some R package (see sample code on >Analyticbridge to do it), and the solution is painful, you can not return both >correlation and stdev "by group", as the function can return only one >argument, not a vector. So if you want to return not just two, but say 100 >metrics, it becomes a nightmare. > >Read discussion at http://bit.ly/jRJQvj This is a trivially small problem with a fast compiled language. I used the open source C++ code in AD Model builder to create and analyze data as you describe it. For 2 million records with 2000 groups and 2x2 matrix the code ran in about 1 second on my laptop. Increasing the size to 10 million records 2000 groups and 10x10 matrix took about 25 seconds. Here is the code. main() { int nobs=10000000; int ngroups=2000; int ndim=10; dmatrix obs(1,nobs,1,ndim); dvector dgroups(1,nobs); dmatrix means(1,ngroups,1,ndim); ivector groups(1,nobs); random_number_generator rng(101); obs.fill_randn(rng); // simulated data dgroups.fill_randu(rng); groups=ivector(dgroups*ngroups+1); // randomly assign data to groups d3_array covar(1,ngroups,1,ndim,1,ndim); ivector gtot(1,ngroups); gtot.initialize(); means.initialize(); covar.initialize(); for (int i=1;i<=nobs;i++) { means(groups(i))+=obs(i); covar(groups(i))+=outer_prod(obs(i),obs(i)); gtot(groups(i))+=1; } for (int i=1;i<=ngroups;i++) { means(i)/=gtot(i); covar(i)/=gtot(i); covar(i)-=outer_prod(means(i),means(i)); } ofstream ofs("report"); for (int i=1;i<=ngroups;i++) { ofs << "group " << i << endl; ofs << covar(i) << endl << endl; } } ```

Back to: Top of message | Previous page | Main SAS-L page