LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2011, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 27 Apr 2011 08:01:10 -0400
Reply-To:     Ben Powell <bpowell555@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ben Powell <bpowell555@GMAIL.COM>
Subject:      Re: Tips for optimizing sql data pull from SYBASE database

This was my thread on trying to find out why the local machine is so slow - apparently the remote machine is set to pass thru on the datastep and the local is not? When I run a direct connect performance is excellent.

http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1104A&L=sas-l&P=R10203&X=5202EA008A814AD5F0&Y=bpowell555@gmail.com

On Wed, 27 Apr 2011 04:20:54 -0400, Ben Powell <bpowell555@GMAIL.COM> wrote:

>Fantastic, this shows my local is not using pass thru whereas my remote is >hence the performance difference I mentioned in an earlier post. So how do I >enable pass-thru on the sybase libname when just using datastep? > >Regards > >Ben > >On Fri, 15 Apr 2011 06:54:45 -0700, Terjeson, Mark <Mterjeson@RUSSELL.COM> >wrote: > >>Hi Sterling, >> >>As you know there are two locations involved >>here. Let's call the Sybase the "remote" box >>and your SAS session as the "local" box just >>for discussion sake. In your query the only >>piece that involves the remote box is the >>libref "SYBASE." of the SYBASE.SYBASE_TABLE. >> >>If we think about the order that the SQL is >>going to process things, in SQL theory before >>you can chop out rows and chop out columns >>you have to actually "have" the stuff to work >>with. i.e. we can picture the FROM clause >>fetching all of the tables first and then the >>WHERE clause can chop out rows and then the >>SELECT clause can chop or create columns. In >>this ordering the SYBASE.SYBASE_TABLE is going >>to fetch the entire SYBASE_TABLE over the wire >>(which is the slowest link in the chain) over >>to you SAS session. Then all of the rest of >>your query is on the SAS "local" side now that >>is has the whole table to work with. >> >>You can have total control over at the "remote" >>location and total control back on the "local" >>side when you use PROC SQL in SAS with the >>CONNECTION TO (pass-through) syntax. example below >> >> >>(Discussion after) >> >> >>In your SAS program: >> >> >> * if returning a result ; >>proc sql; >> >> connect to sqlsvr(uid=myuserid pwd=mypwd dsn=myodbctoken); >> >> create table myresult as >> select * >> from connection to sqlsvr >> ( >> ----my SQLSVR passthrough query here---- >> ) >> >>quit; >> >> >> >> * if not returning a result ; >>proc sql; >> >> connect to sqlsvr(uid=myuserid pwd=mypwd dsn=myodbctoken); >> >> execute( >> ----my SQLSVR passthrough statements here---- >> ) by sqlsvr; >> >>quit; >> >> >>Where the pass-through text is marked (inside the >>parentheses) ALL of the text/code you put inside >>the parentheses is passed over the wire to the >>"remote" box and run there on the "remote" box >>just as if you ran it on the remote box. Your >>pass-through query itself can subset, shrink >>your rows and columns down as much as you can >>and then only that smaller resultant data is >>streamed over the wire back to your SAS PROC >>SQL FROM clause. Then the rest of your SAS >>query for the local side has a much smaller table >>to work with. Thus, better performance. >> >>Using the libref approach can even get drastically >>worse if certain syntax combination are involved >>such as row-based functions applied to the WHERE >>clause, etc., or when a correlated query syntax >>is used which means every row has to be compared >>or fetched for every row back and forth across >>the wire. A 20 second query can turn in 20 hours >>in the blink of an eye. >> >>The pass-through approach avoids all of these >>pitfalls so that's what most people use when >>working remote&local. Of course, if the tables >>are very small you can use the libref stuff just >>fine, but as you now can see, if your small table >>grows at all your performance will go down hill fast. >> >> >>The first example above is when you are wanting a >>resultant data table returned back to your query. >>The second EXECUTE() example is for when you want >>to build a temp table during your session or create >>or drop or modify something via SQL code on the >>remote box without returning a resultant data table. >>So you can intermix returned queries and execute() >>as long as you remain in the same PROC SQL;QUIT; >>session. i.e. the connection string made to the >>remote box inside the PROC SQL remains the same >>connection sessions until you either have a >>disconnect statement or the QUIT occurs. That way >>you can built a temp table and then use the temp >>table in a subsequent query that returns a table. >>Said differently if you break those pieces up in >>separate PROC SQL-QUITs then you have to remember >>that if the temp table over on the remote box is >>built in one PROC SQL-QUIT and then you have your >>query in a subsequent PROC SQL-QUIT to use that >>temp table it won't work because when the temp >>table got built, as soon at its QUIT occurs >>everything in that PROC SQL-QUIT session goes poof! >>So is not still there when the following PROC SQL >>executes. So remember that you can have multiple >>queries and/or execute()s in one single PROC SQL-QUIT >>session. >> >>Another note is that you need to remember that the >>pass-through code inside the paretheses is actually >>text that is sent over to the remote box, so the >>pass-through code MUST be syntax that the remote box >>likes to digest. i.e. if sybase, then it must be >>valid sybase syntax. In fact, the quickest way to >>troubleshoot/test/build your pass-through code syntax >>is that you can actually just copy&paste the pass- >>through code over to the sybase edit/query window >>and test run it on the sybase box. If it works there >>that same resultant table is what your SAS query >>will receive back. >> >> >> >>Hope this is helpful. >> >> >>Mark Terjeson >>Investment Business Intelligence >>Investment Management & Research >>Russell Investments >>206-505-2367 >> >> >>Russell >>Global Leaders in Multi-Manager Investing >> >> >> >> >> >> >> >> >> >> >> >> >>-----Original Message----- >>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of >>Sterling Paramore >>Sent: Thursday, April 14, 2011 1:25 PM >>To: SAS-L@LISTSERV.UGA.EDU >>Subject: Tips for optimizing sql data pull from SYBASE database >> >>Dear SAS-L, >> >>I'm using proc sql to extract some data from a SYBASE database. I'm >>having trouble optimizing the process, because I can't tell what parts >>of the process are being done on the database server and what parts >>are first being pulled into SAS before processing. >> >>For example, suppose I have a SAS dataset with a single record holding >>a datetime variable. I want to pull records from the database that >>have a datetime that is larger than the one in my SAS dataset. My >>first thought was: >> >>proc sql; >> create table NEW_DATA as >> select >> * >> from >> SYBASE.SYBASE_TABLE as t1 >> where >> t1.LOAD_DATETIME > (select LOAD_DATETIME from >>WORK.SAS_TABLE_MAXDATETIME); >>quit; >> >>However, I can't tell if the whole SYBASE_TABLE is first being pulled >>into SAS and then SAS is applying the where clause, or if only those >>records that are satisfying this criteria are being pulled. >> >>Any thoughts? >> >>Thanks, >>Sterling


Back to: Top of message | Previous page | Main SAS-L page