Date: Wed, 27 Apr 2011 08:01:10 -0400
Reply-To: Ben Powell <bpowell555@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ben Powell <bpowell555@GMAIL.COM>
Subject: Re: Tips for optimizing sql data pull from SYBASE database
This was my thread on trying to find out why the local machine is so slow -
apparently the remote machine is set to pass thru on the datastep and the
local is not? When I run a direct connect performance is excellent.
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1104A&L=sas-l&P=R10203&X=5202EA008A814AD5F0&Y=bpowell555@gmail.com
On Wed, 27 Apr 2011 04:20:54 -0400, Ben Powell <bpowell555@GMAIL.COM> wrote:
>Fantastic, this shows my local is not using pass thru whereas my remote is
>hence the performance difference I mentioned in an earlier post. So how do I
>enable pass-thru on the sybase libname when just using datastep?
>
>Regards
>
>Ben
>
>On Fri, 15 Apr 2011 06:54:45 -0700, Terjeson, Mark <Mterjeson@RUSSELL.COM>
>wrote:
>
>>Hi Sterling,
>>
>>As you know there are two locations involved
>>here. Let's call the Sybase the "remote" box
>>and your SAS session as the "local" box just
>>for discussion sake. In your query the only
>>piece that involves the remote box is the
>>libref "SYBASE." of the SYBASE.SYBASE_TABLE.
>>
>>If we think about the order that the SQL is
>>going to process things, in SQL theory before
>>you can chop out rows and chop out columns
>>you have to actually "have" the stuff to work
>>with. i.e. we can picture the FROM clause
>>fetching all of the tables first and then the
>>WHERE clause can chop out rows and then the
>>SELECT clause can chop or create columns. In
>>this ordering the SYBASE.SYBASE_TABLE is going
>>to fetch the entire SYBASE_TABLE over the wire
>>(which is the slowest link in the chain) over
>>to you SAS session. Then all of the rest of
>>your query is on the SAS "local" side now that
>>is has the whole table to work with.
>>
>>You can have total control over at the "remote"
>>location and total control back on the "local"
>>side when you use PROC SQL in SAS with the
>>CONNECTION TO (pass-through) syntax. example below
>>
>>
>>(Discussion after)
>>
>>
>>In your SAS program:
>>
>>
>> * if returning a result ;
>>proc sql;
>>
>> connect to sqlsvr(uid=myuserid pwd=mypwd dsn=myodbctoken);
>>
>> create table myresult as
>> select *
>> from connection to sqlsvr
>> (
>> ----my SQLSVR passthrough query here----
>> )
>>
>>quit;
>>
>>
>>
>> * if not returning a result ;
>>proc sql;
>>
>> connect to sqlsvr(uid=myuserid pwd=mypwd dsn=myodbctoken);
>>
>> execute(
>> ----my SQLSVR passthrough statements here----
>> ) by sqlsvr;
>>
>>quit;
>>
>>
>>Where the pass-through text is marked (inside the
>>parentheses) ALL of the text/code you put inside
>>the parentheses is passed over the wire to the
>>"remote" box and run there on the "remote" box
>>just as if you ran it on the remote box. Your
>>pass-through query itself can subset, shrink
>>your rows and columns down as much as you can
>>and then only that smaller resultant data is
>>streamed over the wire back to your SAS PROC
>>SQL FROM clause. Then the rest of your SAS
>>query for the local side has a much smaller table
>>to work with. Thus, better performance.
>>
>>Using the libref approach can even get drastically
>>worse if certain syntax combination are involved
>>such as row-based functions applied to the WHERE
>>clause, etc., or when a correlated query syntax
>>is used which means every row has to be compared
>>or fetched for every row back and forth across
>>the wire. A 20 second query can turn in 20 hours
>>in the blink of an eye.
>>
>>The pass-through approach avoids all of these
>>pitfalls so that's what most people use when
>>working remote&local. Of course, if the tables
>>are very small you can use the libref stuff just
>>fine, but as you now can see, if your small table
>>grows at all your performance will go down hill fast.
>>
>>
>>The first example above is when you are wanting a
>>resultant data table returned back to your query.
>>The second EXECUTE() example is for when you want
>>to build a temp table during your session or create
>>or drop or modify something via SQL code on the
>>remote box without returning a resultant data table.
>>So you can intermix returned queries and execute()
>>as long as you remain in the same PROC SQL;QUIT;
>>session. i.e. the connection string made to the
>>remote box inside the PROC SQL remains the same
>>connection sessions until you either have a
>>disconnect statement or the QUIT occurs. That way
>>you can built a temp table and then use the temp
>>table in a subsequent query that returns a table.
>>Said differently if you break those pieces up in
>>separate PROC SQL-QUITs then you have to remember
>>that if the temp table over on the remote box is
>>built in one PROC SQL-QUIT and then you have your
>>query in a subsequent PROC SQL-QUIT to use that
>>temp table it won't work because when the temp
>>table got built, as soon at its QUIT occurs
>>everything in that PROC SQL-QUIT session goes poof!
>>So is not still there when the following PROC SQL
>>executes. So remember that you can have multiple
>>queries and/or execute()s in one single PROC SQL-QUIT
>>session.
>>
>>Another note is that you need to remember that the
>>pass-through code inside the paretheses is actually
>>text that is sent over to the remote box, so the
>>pass-through code MUST be syntax that the remote box
>>likes to digest. i.e. if sybase, then it must be
>>valid sybase syntax. In fact, the quickest way to
>>troubleshoot/test/build your pass-through code syntax
>>is that you can actually just copy&paste the pass-
>>through code over to the sybase edit/query window
>>and test run it on the sybase box. If it works there
>>that same resultant table is what your SAS query
>>will receive back.
>>
>>
>>
>>Hope this is helpful.
>>
>>
>>Mark Terjeson
>>Investment Business Intelligence
>>Investment Management & Research
>>Russell Investments
>>206-505-2367
>>
>>
>>Russell
>>Global Leaders in Multi-Manager Investing
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>>Sterling Paramore
>>Sent: Thursday, April 14, 2011 1:25 PM
>>To: SAS-L@LISTSERV.UGA.EDU
>>Subject: Tips for optimizing sql data pull from SYBASE database
>>
>>Dear SAS-L,
>>
>>I'm using proc sql to extract some data from a SYBASE database. I'm
>>having trouble optimizing the process, because I can't tell what parts
>>of the process are being done on the database server and what parts
>>are first being pulled into SAS before processing.
>>
>>For example, suppose I have a SAS dataset with a single record holding
>>a datetime variable. I want to pull records from the database that
>>have a datetime that is larger than the one in my SAS dataset. My
>>first thought was:
>>
>>proc sql;
>> create table NEW_DATA as
>> select
>> *
>> from
>> SYBASE.SYBASE_TABLE as t1
>> where
>> t1.LOAD_DATETIME > (select LOAD_DATETIME from
>>WORK.SAS_TABLE_MAXDATETIME);
>>quit;
>>
>>However, I can't tell if the whole SYBASE_TABLE is first being pulled
>>into SAS and then SAS is applying the where clause, or if only those
>>records that are satisfying this criteria are being pulled.
>>
>>Any thoughts?
>>
>>Thanks,
>>Sterling
|