|
Zhaoping, in essence, wrote:
>The following code is used to create a SAS dataset/table from
>our DB2 database:
>
> proc sql;
> connect to db2 (database=dbgood user= using= );
> create table sys.one as
> select * from connection to db2
> (select var1 from good.table1);
> disconnect from db2;
> quit;
>
> Note that the middle portion of the code is not SAS but standard
> SQL recognizable by DB2. You can insert "where" clause in that portion to pull a
> part of table1 on DB2 and to create a smaller SAS data set. Suppose you have an id
> list that is too long to be typed so you cannot use "where" clause. You would
> necessarily think of the following solution:
>
> proc format cntlin=namelist; run;
>
> where id=start, label='OK', and fmtname='$namex', and then try
>
> if put (id, $namex.) = 'OK' ;
>
> But how to fill this subsetting into the standard portion of the above SQL code
> stream ? Also assume that table1 on DB2 is huge, so it's not possible to create
> a new table that contains the same number of obs as table1 (and then try
> subsetting on your own platform).
Zhaoping,
I hope you follow Bob's advice and join the group, but even if you decide to
abstain, you can view the thread via deja.
Both of your problems can be addressed by creating a view against
good.table1 instead of a disk dataset:
proc sql;
connect to db2 (SSID=XXXX);
create VIEW sys.oneview as
select * from connection to db2
(select var1 from good.table1);
disconnect from db2;
quit;
Now, you can refer to the view SYS.ONEVIEW in a DATA step and apply either
WHERE or IF clause (your choice) using the format you have already defined,
as in
DATA SYS.ONE ;
SET SYS.ONEVIEW ;
WHERE PUT (ID, $NAMEX.) = OK ;
RUN;
Note that by doing this, you are basically reading good.table1 serially,
thus loosing the possibility of exploiting a DB2 index, should it happen to
have been defined on ID. If the latter is the case, and the number of keys
in the WHERE clause is _significantly_ less than the total number of keys in
the DB2 table you are reading (say, 5-10 percent), then it would make much
more sense to create an ordered, comma-delimited list of the names you are
searching and embed if straight into the DB2 query without even bothering
about the view:
proc sql;
SELECT DISTINCT ID INTO: NAMELIST
SEPARATED BY ','
FROM NAMELIST
ORDER BY ID
;
connect to db2 (SSID=XXXX);
create TABLE SYS.ONE as
select * from connection to db2
(select var1
from good.table1
WHERE ID IN (&NAMELIST)
);
disconnect from db2;
quit;
The caveat here is that the entire length of an SQL query may not exceed
32767 bytes. So, if the &NAMELIST happens to get so long that it precludes
the DB2 query from being bound, you will have to utilize the 'standard'
kludge of breaking the &NAMELIST in several pieces and macroizing the DB2
query above in such a way that, instead of creating a single dataset, it
would create a number of views, say, SYS.VONE1, SYS.VONE2, etc. The views
then get 'concatenated' in a subsequent DATA step to write a single dataset
SYS.ONE.
On a different note, I do not 'necessarily think' of a format with CNTLIN=
if I need to search a bunch of keys, even though under many circumstances
this solution could be right on the money and the least contrived.
Kind regards,
==================
Paul M. Dorfman
Jacksonville, FL
==================
|