| Date: | Fri, 27 Aug 2010 15:23:31 -0400 |
| Reply-To: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Subject: | Re: SQL performance |
|
| In-Reply-To: | <OF576B7DC9.40582DBC-ON8525778C.0065C582-8525778C.00664224@notes.health.state.ny.us> |
| Content-Type: | text/plain; charset="us-ascii" |
Frank:
While your mileage may vary from platform to platform, your query if written in correct syntax takes a disproportionately longer time to run on my little desktop machine given 5 million tuples as opposed to 500K. I suspect buffering and the ORDER BY clause adds to the time required. Still, may test takes a fraction of the time that yours did (though perhaps your data contain many more targets of interest than does my test data:
70 proc sql;
71 create table test as
72 select l.i,v2,v3
73 from testDS l
74 where lab_id in ('8888-8','9999-1')
75 order by i, v2;
NOTE: Table WORK.TEST created, with 114 rows and 3 columns.
76 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:06.50
cpu time 9.95 seconds
The program as written amounts to little more than a scan of the dataset. Here's the test program in case others want to play along at home:
data testDS;
do i=1 to 5*1E6;
if ranuni(113317)<.00001 then lab_ID="8888-8";
else if ranuni(113317)<.00001 then lab_ID="9999-1";
else lab_ID="0000-0";
v2=repeat('X',49);
v3=repeat('Y',49);
output;
end;
run;
proc sql;
create table test as
select l.i,v2,v3
from testDS l
where lab_id in ('8888-8','9999-1')
order by i, v2;
quit;
If you have many more target lab_id's in a column in a dataset, SAS SQL will optimize this program with a hash index:
proc sql;
create table test as
select l.i,v2,v3
from testDS l
where lab_id in (select lab_id from targets)
order by i, v2;
quit;
Might even work faster than your query, but not likely to improve matters much. The real savings would come in not having to write a long IN list.
S
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Bin Zhu
Sent: Friday, August 27, 2010 2:37 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SQL performance
Hi,
I am querying a table with 5 millions records and 5 fields using the
following SQL code:
proc sql;
create table test as
select l.id,l.sample_dt
from temp.lab l
where =lab_id in ('8888-8','9999-1')
order by id, sample_dt;
quit;
It takes very long time to run (6 minutes). Is a way to make it run faster?
Thanks in advance.
Frank
IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation.
|