|
(And you should use quit not run to terminate proc datasets).
So which is faster?
On Wed, 25 Feb 2004 20:27:20 -0500, Howard Schreier
<Howard_Schreier@ITA.DOC.GOV> wrote:
>You have unintentionally stacked the deck.
>
>PROC SQL exploits the index and uses a direct access strategy. It only
>reads the rows it needs from A.
>
>The DATA step uses the index to retrieve the rows in order, but it still
>reads all 5 million of them.
>
>On Wed, 25 Feb 2004 18:57:21 -0300, rdporto1 <rdporto1@TERRA.COM.BR> wrote:
>
>>Ok. That's the code and some log.
>>Sincerely, I don't know why this is happening...
>>
>>data b (compress=yes reuse=yes);
>>do b1=1 to 5000000;
>>if b1 in (10 20 30) then x=4;
>>else x=5000000-b1;
>>b2=ranuni(13);
>>b3=ranuni(14);
>>b4=ranuni(15);
>>b5=ranuni(16);
>>b6=ranuni(17);
>>output;
>>end;
>>run;
>>
>>proc datasets library=work;
>>modify b;
>>index create x;
>>run;
>>
>>data a;
>>do a1=1 to 500;
>>x=500-a1;
>>a2=ranuni(87);
>>a3='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
>>output;
>>end;
>>run;
>>
>>proc sort data=a;
>>by x;
>>run;
>>
>>proc sql;
>>create table d as
>>select *
>>from a as a, b as b
>>where a.x=b.x;
>>quit;
>>
>>data c;
>>merge b a (in=k);
>>by x;
>>if k;
>>run;
>>quit;
>>
>>proc compare data=c compare=d;
>>run;
>>
>>Some log...
>>
>>359 proc sql;
>>360 create table d as
>>361 select *
>>362 from a as a, b as b
>>363 where a.x=b.x;
>>WARNING: Variable x already exists on file WORK.D.
>>NOTE: Table WORK.D created, with 503 rows and 10 columns.
>>
>>364 quit;
>>NOTE: PROCEDURE SQL used:
>> real time 0.01 seconds
>> cpu time 0.01 seconds
>>
>>
>>365 data c;
>>366 merge b a (in=k);
>>367 by x;
>>368 if k;
>>369 run;
>>
>>NOTE: There were 5000000 observations read from the data set WORK.B.
>>NOTE: There were 500 observations read from the data set WORK.A.
>>NOTE: The data set WORK.C has 503 observations and 10 variables.
>>NOTE: DATA statement used:
>> real time 9.71 seconds
>> cpu time 9.65 seconds
|