Date: Tue, 3 Feb 2004 14:42:18 -0600
Reply-To: Toby Dunn <tdunn@OAKHILLTECH.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Toby Dunn <tdunn@OAKHILLTECH.COM>
Subject: Re: SQL speed
In-Reply-To: <200402032032.i13KWaB30075@listserv.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
I remember my mainframe admins saying that on our system it takes sorting a
certain number of obs before our system would use syncsort, otherwise it
would just use sas's sort. Wonder if the same is true weather or not im
using a datastep or a sql statement. Hmmmmmm....yet another thind on a long
list to ponder..
Toby Dunn
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Don
Stanley
Sent: Tuesday, February 03, 2004 2:33 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: SQL speed
Yes, as i recall when proc syncsort is installed any proc sort
automagically becomes a proc syncsort. This is very sensible, I wouldn't
want to be coding "proc syncsort" in my jobs and then have the company
dictate we move to a different sort engine. I'd have a massive overhead in
maintenance. I had neglected to mention this, and it does mean effectively
only two choices for Proc SORT, SAS sort or proc syncsort. for SQL it also
means two choices, SAS SORT or system syncsort.
I always understood proc syncsort to be a proc sort replacement, no
intention on the part of the developers to use it with SAS SQL (which
wasn't even around when I first used proc syncsort!). I wasn't clear about
that in my earlier post.
Crux of my message doesn't change. PROC SYNCSORT is very worthwhile. There
are situations where you cannot avoid sorting in SQL, otherwise with proc
syncsort installed it just gets used with proc sort.
btw, in my earlier I referred to "disk speed". I meant "disk space".
Don
On Tue, 3 Feb 2004 10:06:04 -0600, pudding man <pudding_man@MAIL.COM> wrote:
>Don,
>
>Given that, in Alex SAS log:
>
>1.) The PROC SORT code yields the explicit message:
> "NOTE: WER750I End PROC SYNCSORT. R2.2D+"
> and the PROC SQL code apparently does not.
>
>2.) The PROC SORT code uses about 15% less cpu cycles.
>
>Isn't it possible that PROC SORT is automagically
>linking to the PROC SYNCSORT engine and PROC SQL is
>not (is rather using maybe the standard interface to the
>standard SYNCSORT utility)?
>
>If this is the case, it could be due to
>the engineering of the PROC SYNCSORT engine or
>the engineering of the PROC SQL sort interface
>or both?
>
>There are SAS system options that could likely shed
>light on this issue ... SORTMSG and SORTMSG=ddname
>come to mind ...
>
>I have long been curious about this but have no
>suitable system on which to test.
>
> Zalut,
> Puddin'
>
>*******************************************************
>***** Puddin' Man **** Pudding_Man-at-mail.com ********
>*******************************************************;
>
>----- Original Message -----
>From: Don Stanley <don_stanley@PARADISE.NET.NZ>
>Date: Mon, 2 Feb 2004 22:33:54 -0500
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: SQL speed
>
>> Using PROC Syncsort, as opposed to Syncsort, should generally see sort
>> times decrease. However, this is really not noticeable with smallish
>> datasets like you have posted with. When I evaluated PROC SYNCSORT for
the
>> mainframe 2-3 years ago, I repeated success stories made earlier -- PROC
>> SYNCSORT could at least halve sort times for very large datasets.
>>
>> When I asked for an explantion of how this happened, Syncsort gave a very
>> simple and plausible answer which I assume to be correct. They said when
>> SAS uses a system sort, it must first get data into a format suitable for
>> that system sort. So there is an overhead on SAS to prepare data, and at
>> the other end to get data back. But using PROC SYNCSORT (note -- proc
>> syncsort, not the system syncsort product) they removed this step by
>> writing an engine that directly accesses the SAS dataset. In fact,
quoting
>> from the Syncsort website seems to confirm this .. "PROC SYNCSORT
achieves
>> such gains by providing a direct interface between SAS and SyncSort. This
>> frees SyncSort to use the high-performance techniques-sophisticated
access
>> methods, path length minimization, I/O optimization-that have made
SyncSort
>> the most frequently used sort product."
>>
>> There are effectively 3 sorts available to you (I'm referring to the
poster
>> here -- not the generic SAS user "you"). Using a system Syncsort, using
SAS
>> internal sort, and using PROC SYNCSORT. There is an internal option to
>> control whether SAS uses its own sort or system sort. I'm not sure that
the
>> question of using SQL, as opoosed to say, PROC SORT, is really relevant,
>> but I think you're getting at the difference between the "one step" proc
>> sql approach, and the dual step sql and proc sort/syncsort approach. In
>> this instance, I'd personally always consider the proc syncsort rather
than
>> automatically doing everything in sql.
>>
>> In my experience, the biggest gains were on datasets with lots of records
>> and lots of variables, or where there were lots of records with long
>> character variables. PROC SYNCSORT seemed to handle these with ease,
whilst
>> SAS sort seemed to perform best with small narrow datasets. I tended to
try
>> system sort and proc syncsort at first, but fairly soon I got to the
point
>> where I stuck with the proc routinely for large datasets (>500,000
records,
>> or very wide records).
>>
>> Proc Syncsort also had a positive impact on disk speed. It used much
less.
>>
>> If your SQL were a view ... and proc syncsort sorted the view output, you
>> have no extra disk overhead, and it should run extremely quick ... and
its
>> sort of a one step thing (excuse the pun) ...
>>
>> Don
>>
>>
>> On Mon, 2 Feb 2004 18:07:29 -0500, Alexander Cavallo
<acavallo@LEXECON.COM>
>> wrote:
>>
>> >Hi SAS-L,
>> >
>> >I am new to using SQL. From reading SAS-L, I thought that SQL sorts
faster
>> >than SYNCSORT. In the example below, SYNCSORT is faster. Could you
>> >explain to me how big a dataset needs to be before SQL is faster?
>> >
>> >Thanks,
>> >
>> >--Alex Cavallo
>> >Lexecon, Inc.
>> >
>> >
>> >48 proc sql noprint;
>> >49 create table temp.sql as (
>> >50 select distinct emp_id, date
>> >51 from hrdata.all )
>> >52 order by emp_id, date
>> >53 ;
>> >NOTE: Table TEMP.SQL created, with 179291 rows and 2 columns.
>> >
>> >54 quit;
>> >NOTE: The PROCEDURE SQL used 2.22 CPU seconds and 10730K.
>> >
>> >55
>> >56 proc sort nodupkey data=hrdata.all out=temp.test(keep=emp_id
>> >date);
>> >57 by emp_id date;
>> >58 run;
>> >
>> >NOTE: WER750I End PROC SYNCSORT. R2.2D+
>> >NOTE: There were 179291 observations read from the data set HRDATA.ALL.
>> >NOTE: The data set TEMP.TEST has 179291 observations and 2 variables.
>> >NOTE: The PROCEDURE SORT used 1.88 CPU seconds and 10907K.
>>
>
>--
>_______________________________________________
>Get your free email from http://www.mail.com
Yes, as i recall when proc syncsort is installed any proc sort
automagically becomes a proc syncsort. This is very sensible, I wouldn't
want to be coding "proc syncsort" in my jobs and then have the company
dictate we move to a different sort engine. I'd have a massive overhead in
maintenance. I had neglected to mention this, and it does mean effectively
only two choices, SAS sort or syncsort.
I always understood proc syncsort to be a proc sort replacement, no
intention on the part of the developers to use it with SAS SQL (which
wasn't even around when I first used proc syncsort!). I wasn't about that
in my earlier post.
Crux of my message doesn't change. PROC SYNCSORT is very worthwhile. There
are situations where you cannot avoid sorting in SQL, otherwise with proc
syncsort installed it just gets used.
|