LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2004, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 30 Jan 2004 13:18:56 -0500
Reply-To:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:   Re: Proc Sql vs. Datastep
Comments:   To: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Content-Type:   text/plain

Howard: I suspect that SQL does find a SORTEDBY value that confirms the sort order of the data, or the data are being read from sorted external files into a SAS dataset. This example shows that the distinct counts occur within groups:

data testSum; input Mid totdaysofwk Mweek totspendofwk ; cards; 1 2 3 4 1 6 7 8 2 5 4 3 ; run; proc sql; select Mid, sum(totdaysofwk)/count(distinct Mweek) as meandays, sum(totspendofwk) as totspend from testSum group by Mid; quit;

Since the DISTINCT counts occur within groups, the order of data rows should not matter that much unless the groups span very large numbers of rows. Sig -----Original Message----- From: Howard Schreier [mailto:Howard_Schreier@ITA.DOC.GOV] Sent: Thursday, January 29, 2004 10:16 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Proc Sql vs. Datastep

Two theories about this particular case:

(1) The input is sorted, but that fact is not recorded in the dataset header so SQL does not know about it and re-sorts.

(2) SQL does not exploit the sort sequence (or employ any other single-pass strategy) in accumulating the DISTINCT counts.

On Thu, 29 Jan 2004 14:15:00 -0500, Sigurd Hermansen <HERMANS1@WESTAT.COM> wrote:

>NetComm888: >A couple of comparisons of fast SAS Data step programs relative to slow >SAS SQL programs have appeared on the 'L recently. This third such >comparison has at last provoked my usual response. > >If you are trying to mimic the SQL GROUP BY, I don't believe that your >Data step will work properly unless you have data in a specific sort >order. SAS SQL automatically sorts a dataset as required unless the >compiler has an index or a SORTEDBY value that indicates required sort >order. While a relatively simple SAS SQL query will take a bit longer >to execute, sorting time probably accounts for much of the difference >that you are observing. > >SAS SQL does not optimize the LEFT JOIN, transitive INNER JOIN's, and >some JOIN's that, as Paul Dorfman has demonstrated repeatedly, could be >accelerated using hash indexes. Those exceptions aside, the time, >effort, and risk of specialized Data step development, in my >experience, generally off-sets the slightly longer waiting times and >slightly greater CPU usage required to execute SQL queries. Sig > >-----Original Message----- >From: NetComm888 [mailto:netcomm888@YAHOO.COM] >Sent: Thursday, January 29, 2004 1:23 PM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Proc Sql vs. Datastep > > >I did the following comparison on a very large dataset, which is sorted >by MId and MWeek. The goal is to summarize the spend and visits by >MId. Proc Sql is much neater in terms of coding, very easy to >understand and follow. But the problem is sql runs much slower than the >datastep. The datastep achieves the same goal by using the retain >statement. The coding is kinda ugly, but runs super faster. Could >anybody explain why to me? I didn't expect Proc Sql could be this slow, >compared with the datastep. > > >proc sql; > create table test.Midstats_sql as > select Mid, sum(totdaysofwk)/count(distinct Mweek) as meandays, > sum(totspendofwk) as totspend > from test.weekstats_o > group by Mid; >quit; > > > >**************** The datastep is used to achieve the same goal as the >sql above ****************; data test.Midstats_dat; > set test.weekstats_o end=eof; > if _n_=1 then do; > retainMid=Mid; > totdays=0; > retainMwk=Mweek; > totwks=1; > totspend=0; >end; > >retain retainMid totdays retainMwk totwks totspend; > >if Mid=retainMid then do; > totspend=sum(totspend,totspendofwk); > totdays=sum(totdays,totdaysofwk); > if Mweek ^=retainMwk then do; > totwks=totwks+1; > retainMwk=Mweek; > end; >end; > >else if Mid^=retainMid then do; > output test.Midstats_dat; > retainMid=Mid; > totdays=totdaysofwk; > totspend=totspendofwk; > retainMwk=Mweek; > totwks=1; > >end; > >if eof then do; > output test.Midstats_dat;; >end; > >run;


Back to: Top of message | Previous page | Main SAS-L page