Date: Thu, 6 Dec 2007 15:54:07 -0600
Reply-To: "data _null_," <datanull@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "data _null_," <datanull@GMAIL.COM>
Subject: Re: IF vs WHERE discussion
In-Reply-To: <f3ed116f0712061350t79b74d9bk24715c818817477c@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
But, we need a program that we can pass around to generate a BIGFILE.
So we can all use the same data.
On Dec 6, 2007 3:50 PM, bruce johnson <chimanbj@gmail.com> wrote:
> The WORK library is local. And this file is a datafile that I use for
> testing many different scenarios. It's filled with all types of
> fields, not just random number fields. This probably the reason why
> you see such a difference in the times.
>
>
> On Dec 6, 2007 3:47 PM, data _null_, <datanull@gmail.com> wrote:
> > For example creating a WORK.BIGFILE on my PC where WORK library is not
> > "out on the network" is much faster. Real time very close to CPU. I
> > realize my BIGFILE is not like yours. You have many more variables.
> >
> > 388 option fullstimer msglevel=i;
> > 389 data work.bigfile;
> > 390 do _n_ = 1 to 6331860;
> > 391 sex = rantbl(12345,1/6,1/6,1/6,1/6, 1/6,1/6,1/6);
> > 392 age = floor(abs(rannor(12345)*20) + 30);
> > 393 output;
> > 394 end;
> > 395 run;
> >
> > NOTE: The data set WORK.BIGFILE has 6331860 observations and 2 variables.
> > NOTE: DATA statement used (Total process time):
> > real time 5.71 seconds
> > user cpu time 4.96 seconds
> > system cpu time 0.71 seconds
> > Memory 152k
> >
> >
> >
> > On Dec 6, 2007 3:26 PM, bruce johnson <chimanbj@gmail.com> wrote:
> > > Whenever you're writing data to disk, the real and CPU time will
> > > differ because of the disk I/O.
> > >
> > > But since you requested it, here it is (putting the where clause in
> > > the SET statement is the clear winner):
> > >
> > > 312 options fullstimer;
> > > 313 data test;
> > > 314 set saslib.bigfile(where=(sex=6 and age<50));
> > > 315 run;
> > >
> > > NOTE: There were 1812750 observations read from the data set SASLIB.BIGFILE.
> > > WHERE (sex=6) and (age<50);
> > > NOTE: The data set WORK.TEST has 1812750 observations and 76 variables.
> > > NOTE: DATA statement used (Total process time):
> > > real time 3:39.14
> > > user cpu time 6.66 seconds
> > > system cpu time 10.01 seconds
> > > Memory 223k
> > >
> > >
> > > 316 data test;
> > > 317 set saslib.bigfile;
> > > 318 where sex=6 and age<50;
> > > 319 run;
> > >
> > > NOTE: There were 1812750 observations read from the data set SASLIB.BIGFILE.
> > > WHERE (sex=6) and (age<50);
> > > NOTE: The data set WORK.TEST has 1812750 observations and 76 variables.
> > > NOTE: DATA statement used (Total process time):
> > > real time 4:04.42
> > > user cpu time 7.52 seconds
> > > system cpu time 10.40 seconds
> > > Memory 222k
> > >
> > >
> > > 320 data test;
> > > 321 set saslib.bigfile;
> > > 322 if sex=6 and age<50;
> > > 323 run;
> > >
> > > NOTE: There were 6331860 observations read from the data set SASLIB.BIGFILE.
> > > NOTE: The data set WORK.TEST has 1812750 observations and 76 variables.
> > > NOTE: DATA statement used (Total process time):
> > > real time 3:48.71
> > > user cpu time 7.70 seconds
> > > system cpu time 10.43 seconds
> > > Memory 212k
> > >
> > >
> > >
> > > On Dec 6, 2007 1:48 PM, data _null_, <datanull@gmail.com> wrote:
> > > > I'm not sure your sample is large enough. Can you post the code that
> > > > makes BIGFILE and make it bigger. The OP mentioned 1-5M but you only
> > > > have 0.5M
> > > >
> > > > Are you sharing you computer. It might be better to test when real
> > > > and CPU time are closer. When your computer has less contention for
> > > > resource.
> > > >
> > > >
> > > > On Dec 6, 2007 1:10 PM, bruce johnson <chimanbj@gmail.com> wrote:
> > > >
> > > > > Chew on this...
> > > > > 10 options fullstimer;
> > > > > 11 data test;
> > > > > 12 set saslib.bigfile;
> > > > > 13 if sex=6 and age<50;
> > > > > 14 run;
> > > > >
> > > > > NOTE: There were 422124 observations read from the data set SASLIB.BIGFILE.
> > > > > NOTE: The data set WORK.TEST has 120850 observations and 76 variables.
> > > > > NOTE: DATA statement used (Total process time):
> > > > > real time 18.11 seconds
> > > > > user cpu time 0.37 seconds
> > > > > system cpu time 0.81 seconds
> > > > > Memory 212k
> > > > >
> > > > >
> > > > > 15 data test;
> > > > > 16 set saslib.bigfile;
> > > > > 17 where sex=6 and age<50;
> > > > > 18 run;
> > > > >
> > > > > NOTE: There were 120850 observations read from the data set SASLIB.BIGFILE.
> > > > > WHERE (sex=6) and (age<50);
> > > > > NOTE: The data set WORK.TEST has 120850 observations and 76 variables.
> > > > > NOTE: DATA statement used (Total process time):
> > > > > real time 23.56 seconds
> > > > > user cpu time 0.59 seconds
> > > > > system cpu time 1.15 seconds
> > > > > Memory 222k
> > > > >
> > > > >
> > > > >
> > > > > On Dec 6, 2007 12:46 PM, data _null_, <datanull@gmail.com> wrote:
> > > > > > TEST don't speculate. OPTIONS FULLSTIMER;
> > > > > >
> > > > > >
> > > > > > On Dec 6, 2007 12:41 PM, LWn <Lars.WahlgrenRemove@this.stat.lu.se> wrote:
> > > > > > > Is there any difference between IF and WHERE when used in a data step?
> > > > > > > Data set zero below has between 1 and 5 million records.
> > > > > > >
> > > > > > > data one ;
> > > > > > > set zero ;
> > > > > > > WHERE <condition1 AND condition2> ;
> > > > > > > or
> > > > > > > IF <condition1 AND condition2> ;
> > > > > > > run ;
> > > > > > >
> > > > > > > I say there is NO difference regarding efficiency but a friend says WHERE is
> > > > > > > more efficient.
> > > > > > > I've made some simulations that supports my opinion.
> > > > > > >
> > > > > > > Do you all support my opinion or am I missing something?
> > > > > > >
> > > > > > > /LarsW
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
|