Date: Thu, 19 Jan 2006 07:32:53 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Proc SQL to Data Step
Content-Type: text/plain; charset="iso-8859-1"
Please read David Cassell's sage commentary on the SQL performance vs. Data step performance question. SAS-L archives include many postings that show just the opposite.
SQL solutions typically take slightly longer to execute on smaller datasets because the SQL compiler constructs an execution plan before it begins processing rows of data. Differences between approriate SQL queries and Data step processing tend to diminish as numbers of rows increase. Notable exceptions include merges of more than two datasets on the same BY group, merges controlled by pointers that take advantage of dataset direct access, and left joins, and special uses of hash objects. (Breaking a LEFT join down into an inner join and a union actually brings SQL performance in line with Data step performance.)
Because Data step MERGE's assume ordering of rows of datasets in BY group order (and thus risk failing if that assumption fails to hold), while SQL queries order datasets unless SORTED BY values in metadata indicate proper sort order, MERGE's often appear to perform better than SQL queries. Updating the SORTED BY variable to the join key value lets the SQL compiler bypass sorting of datasets.
Sig
________________________________
From: owner-sas-l@listserv.uga.edu on behalf of Chandak
Sent: Wed 1/18/2006 10:09 AM
To: sas-l@uga.edu
Subject: Re: Proc SQL to Data Step
The data set was already sorted (using it for some other purpose), time
doesn't include sorting time but for my program I have saved time a
lot.
I have seen many times data step is faster then SQL on many millions
observations. I am also SQL fan but time matters. SQL is always faster
on fewer observations, I can't say number but it is true.
Thanks,
Nitin
|