| Date: | Tue, 28 Dec 2004 18:15:20 -0500 |
| Reply-To: | "James, Steve" <SPJ1@CDC.GOV> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "James, Steve" <SPJ1@CDC.GOV> |
| Subject: | Re: Big Table / Little Table Update |
|
| Content-Type: | text/plain; charset="us-ascii" |
Mike,
Thanks for the ideas. I tried a similar approach using a data step that
looked something like:
data big_table ;
set small ;
modify big_table key=key ;
if setdate < maxdate then flag = 0 ; else flag = 1 ;
if _iorc_=0 then replace;
run ;
However it didn't work. I read something in an old SAS-L message that
made me believe that MODIFY wouldn't work with multiple values of the
key value, and even if I removed them from the small table, they would
always exist in the big one. Perhaps that explains why it wouldn't work
for me.
Then I've tried your approach using SQL but I'm getting stuck at the
final step. I can't figure out how to code the SQL so that it will
update only those records in the big table that have matching records in
the transaction. The following is the code that I'm trying ( a
simplified version of your approach), but there are a couple problems
with it. The log leads me to believe that SQL is going through every
record in the big data set, which would eliminate any performance boost.
Additionally records that aren't in the small data set but in the big
one have their flag set to missing.
TIA,
Steve
Centers for Disease Control and Prevention
sjames@cdc.gov
* PROGRAM STARTS HERE ;
* big data set with small data appended to it by hand ;
data big (index=(id)) ;
input id setdate flag ;
datalines ;
1 1 0
1 2 0
2 2 1
3 3 1
4 4 1
5 5 1
1 4 .
2 3 .
3 3 .
;
run ;
data small (index=(id)) ;
input id setdate ;
datalines ;
1 4
2 3
3 3
;
run ;
proc sql;
update big as a
set flag = (select (a.setdate ge b.setdate)
from small as b where a.id = b.id)
;
quit ;
part of the log follows:
807 ;
808 run ;
809
810 proc sql;
811 update big as a
812 set flag = (select (a.setdate ge b.setdate)
813 from small as b where a.id = b.id)
814 ;
INFO: Index id selected for WHERE clause optimization.
INFO: Use of index id for WHERE clause optimization cancelled.
INFO: Index id selected for WHERE clause optimization.
INFO: Use of index id for WHERE clause optimization cancelled.
INFO: Index id selected for WHERE clause optimization.
INFO: Use of index id for WHERE clause optimization cancelled.
INFO: Index id selected for WHERE clause optimization.
INFO: Use of index id for WHERE clause optimization cancelled.
INFO: Index id selected for WHERE clause optimization.
INFO: Use of index id for WHERE clause optimization cancelled.
INFO: Index id selected for WHERE clause optimization.
INFO: Use of index id for WHERE clause optimization cancelled.
INFO: Index id selected for WHERE clause optimization.
INFO: Use of index id for WHERE clause optimization cancelled.
INFO: Index id selected for WHERE clause optimization.
NOTE: 9 rows were updated in WORK.BIG.
815 quit ;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.06 seconds
cpu time 0.05 seconds
OUTPUT Follows:
Updated BIG Data Set
Obs id setdate flag
1 1 1 0
2 1 2 0
3 2 2 0
4 3 3 1
5 4 4 .
6 5 5 .
7 1 4 1
8 2 3 1
9 3 3 1
-----Original Message-----
From: Mike Rhoads [mailto:RHOADSM1@WESTAT.com]
Sent: Tuesday, December 28, 2004 3:41 PM
To: James, Steve; SAS-L@LISTSERV.UGA.EDU
Subject: RE: Big Table / Little Table Update
Steve,
It seems to me that this should be doable, assuming that your big table
is
(or can be) indexed by patient ID. As I am understanding your
situation,
the vast majority of patients in your big table are not in your little
table, so that their records would not have to be changed during the
update
process.
Conceptually, I'd do the following:
1. Add your records from Little to Big. (Don't worry about the flag
for
now.)
2. Create a table, IDsInLittle, with one column (ID) and one record for
each distinct ID in Little. This will drive the rest of the process.
3. Create a table, MostRecentVisit, with columns for ID and Date, to
get
the most recent date for each of the patients in IDsInLittle, by joining
IDsInLittle with Big. Hopefully the SQL optimizer will only process the
records from Big that match the IDs in IDsInLittle, so that this will
run
quickly.
4. Use an SQL update statement with a CASE construct to set Flag for
each
record in Big whose ID is in MostRecentVisit to 0 or 1, depending on
whether
the date in Big matches the one in MostRecentVisit.
HTH!
Mike Rhoads
Westat
RhoadsM1@Westat.com
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
James,
Steve
Sent: Tuesday, December 28, 2004 11:09 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Big Table / Little Table Update
SAS-L
We have a problem that I'd like your help on. We have a huge data set
which needs to be updated daily with patient records. Each record has a
patient ID and date of visit. Thus, there are multiple occurrences of
both patient ID and date of visit. What we want to do is to add the
incoming records to the existing data and then set a flag to designate
which records are from the most recent visit. For example:
Big_Table
ID Date Flag
1 12/1 1
1 12/1 1
1 11/31 0
2 12/1 0
2 12/3 1
etc....
Incoming Table
ID Date
1 12/2
1 12/3
2 12/2
2 12/4
Will result in the following
1 12/1 0
1 12/1 0
1 11/31 0
1 12/2 0
1 12/3 1
2 12/1 0
2 12/3 0
2 12/2 0
2 12/4 1
etc...
The main idea is not to have to re-create the big table each time it's
updated since it takes hours to do. Any thoughts on how to do this will
be most appreciated.
Steve James
Centers for Disease Control and Prevention
sjames@cdc.gov
|