Date: Fri, 9 Aug 2002 11:53:00 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: FW: Matching SAS dataset to MS-SQL Server data table
Content-Type: text/plain; charset="iso-8859-1"
I had sent this private reply earlier, but will forward it SAS-L now to
encourage those connecting SAS to Oracle, SQLServer, DB2, Sybase, MySQL, and
other databases to publish the techniques they are using. The ability to
send
lists of identifiers to servers in 'pass-thru' queries greatly enhances the
SAS
System. I recall hearing requests for this capability as far back as the San
Diego SUGI.
How about a SUGI session on this topic? A single comprehensive session or a
series of short sessions (say in the Coder's Corner) would, I believe, gain
a
substantial audience and prove valuable to the many SAS programmers who are
trying to find efficient methods for extracting data from large databases in
RDBMS's.
The reasons for sending 'pass thru' queries to servers include better use of
indexes on the servers, reductions in network traffic, enhanced ability to
distribute tasks to more powerful processors and data access devices, and
more
general query optimization plans. Paul Dorfman recently cited ability to
handle
massive amounts of data as one of the key skills that SAS programmers need
to
learn. That skill set must include the ability to link SAS datasets to
tables
and views on remote servers via networks or across the Internet. Users'
groups
have much more to gain from refined techniques for extracting data from
servers
than RDBMS or programming system vendors, but all in all, SI should benefit
as
well. Charles has done a good job of researching and documenting techniques
for
'passing thru' a query referencing data from SAS to SQLServer. I've seen
fragments of techniques doing the same in Oracle and DB2 in SI documentation
and
on the 'L. Now seems the time to document these techniques.
Sig
> -----Original Message-----
> From: Sigurd Hermansen
> Sent: Thursday, August 08, 2002 8:22 PM
> To: 'Charles'
> Subject: RE: Matching SAS dataset to MS-SQL Server data table
>
>
> Your Alka-Seltzer dilemma: those in charge of the PLOP server
> are blocking any reasonable methods of executing on the
> server a simple inner join between your list of identifiers
> and the PLOP database, and you will have to figure out how
> many identifiers SAS can 'pass thru' to the server at a time.
>
> You might want to try testing an inner join that will run on
> a SQLServer database system. A libname identifies a SAS
> native SQLServer or ODBC engine connected to the PLOP side of
> the network, and another libname identifies a SAS dataset on
> the THUD side of the network. Start with a short list of
> identifiers and test a couple of increasingly longer lists.
> If performance begins degrading badly with scale, you can
> substitute a list of identifiers in a macrovariable, as Peter
> suggests, for the SAS dataset.
>
> Many database developers, vendors, and administrators seem
> convinced that the best way to protect data is to isolate
> them from everyone but the clients of their database system.
> Others can still get to the data, but they make the process
> expensive and untimely. I wonder if the database owners and
> sponsors know that they are paying dearly for this 'service'.
>
> Sig
>
> -----Original Message-----
> From: Charles [mailto:cwentzel@PHC4.ORG]
> Sent: Thursday, August 08, 2002 3:12 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Matching SAS dataset to MS-SQL Server data table
>
>
> So, I have this SAS dataset with 5000 records (name it table
> THUD). I want
> to match it to a MS-SQL Server data table (call it table PLOP
> -- with a
> libname of SQLserv to point to the MS-SQL server) with billions and
> billions of records, some of which match my 5000 records.
>
> Oh, and I only have read privileges from the MS-SQL Server.
> Cannot write a
> table there. That would be verboten.
>
> Also, it's a one-to-many match. Each of my records in THUD
> may have 0, 1,
> 2, 3, 4, or any number of matching records in PLOP.
>
> It's tempting to just write a proc sql join:
>
> proc sql;
> create table SPLASH as
> select
> THUD.key, PLOP.foo
> from THUD, SQLserv.PLOP
> where THUD.key=PLOP.key
> order by THUD.key;
> quit;
>
> This would theoretically give me exactly what I'm looking
> for. But the
> internal workings of this query means that SAS sends a
> request to the MS-
> SQL Server to return all records in PLOP, and SAS then sifts
> through them
> to find matching records. Yuck.
>
> Now, I know, if it were a one-to-one match, I could use the
> DBKEY option:
>
> data SPLASH;
> set THUD;
> set SQLserv.PLOP
> (dbkey=(key)
> keep=key foo
> in=in_plop
> ) key=dbkey;
> if in_plop;
> run;
>
> But since it's a one-to-many match, it makes things more
> difficult. This
> is the best I could come up with:
>
> data SPLASH;
> set THUD;
> do until(in_plop=0);
> set SQLserv.PLOP
> (dbkey=(key)
> keep=key foo
> in=in_plop
> ) key=dbkey;
> if in_plop then output;
> else if _iorc_=1230015 then _error_=0;
> end;
> run;
>
> The annoying thing is, I have to keep asking for the next
> record until I
> get the error that no record was found. Then, SAS sees this as a data
> error (that there was no matching record when requested...
> _iorc_ 1230015)
> and sets the _error_=1, and prints the record to the log.
> Thus the line:
> else if _iorc_=1230015 then _error_=0;
>
> I don't want to print the record to the log. And I really don't like
> resetting the _error_. That seems... dangerous. But how
> else can I know
> if another record is available, other than requesting it and
> finding that
> there is none?
>
> Alternatively, if you can think of a better way to do a join
> between a SAS
> dataset and a large MS-SQL table, let me know.
>
|