LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2004, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 28 Dec 2004 18:15:20 -0500
Reply-To:   "James, Steve" <SPJ1@CDC.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "James, Steve" <SPJ1@CDC.GOV>
Subject:   Re: Big Table / Little Table Update
Comments:   To: Mike Rhoads <RHOADSM1@WESTAT.com>
Content-Type:   text/plain; charset="us-ascii"

Mike,

Thanks for the ideas. I tried a similar approach using a data step that looked something like:

data big_table ; set small ; modify big_table key=key ; if setdate < maxdate then flag = 0 ; else flag = 1 ; if _iorc_=0 then replace; run ;

However it didn't work. I read something in an old SAS-L message that made me believe that MODIFY wouldn't work with multiple values of the key value, and even if I removed them from the small table, they would always exist in the big one. Perhaps that explains why it wouldn't work for me.

Then I've tried your approach using SQL but I'm getting stuck at the final step. I can't figure out how to code the SQL so that it will update only those records in the big table that have matching records in the transaction. The following is the code that I'm trying ( a simplified version of your approach), but there are a couple problems with it. The log leads me to believe that SQL is going through every record in the big data set, which would eliminate any performance boost. Additionally records that aren't in the small data set but in the big one have their flag set to missing.

TIA,

Steve

Centers for Disease Control and Prevention sjames@cdc.gov

* PROGRAM STARTS HERE ;

* big data set with small data appended to it by hand ; data big (index=(id)) ; input id setdate flag ; datalines ; 1 1 0 1 2 0 2 2 1 3 3 1 4 4 1 5 5 1 1 4 . 2 3 . 3 3 . ; run ;

data small (index=(id)) ; input id setdate ; datalines ; 1 4 2 3 3 3 ; run ;

proc sql; update big as a set flag = (select (a.setdate ge b.setdate) from small as b where a.id = b.id) ; quit ;

part of the log follows:

807 ; 808 run ; 809 810 proc sql; 811 update big as a 812 set flag = (select (a.setdate ge b.setdate) 813 from small as b where a.id = b.id) 814 ; INFO: Index id selected for WHERE clause optimization. INFO: Use of index id for WHERE clause optimization cancelled. INFO: Index id selected for WHERE clause optimization. INFO: Use of index id for WHERE clause optimization cancelled. INFO: Index id selected for WHERE clause optimization. INFO: Use of index id for WHERE clause optimization cancelled. INFO: Index id selected for WHERE clause optimization. INFO: Use of index id for WHERE clause optimization cancelled. INFO: Index id selected for WHERE clause optimization. INFO: Use of index id for WHERE clause optimization cancelled. INFO: Index id selected for WHERE clause optimization. INFO: Use of index id for WHERE clause optimization cancelled. INFO: Index id selected for WHERE clause optimization. INFO: Use of index id for WHERE clause optimization cancelled. INFO: Index id selected for WHERE clause optimization. NOTE: 9 rows were updated in WORK.BIG.

815 quit ; NOTE: PROCEDURE SQL used (Total process time): real time 0.06 seconds cpu time 0.05 seconds

OUTPUT Follows:

Updated BIG Data Set

Obs id setdate flag

1 1 1 0 2 1 2 0 3 2 2 0 4 3 3 1 5 4 4 . 6 5 5 . 7 1 4 1 8 2 3 1 9 3 3 1

-----Original Message----- From: Mike Rhoads [mailto:RHOADSM1@WESTAT.com] Sent: Tuesday, December 28, 2004 3:41 PM To: James, Steve; SAS-L@LISTSERV.UGA.EDU Subject: RE: Big Table / Little Table Update

Steve,

It seems to me that this should be doable, assuming that your big table is (or can be) indexed by patient ID. As I am understanding your situation, the vast majority of patients in your big table are not in your little table, so that their records would not have to be changed during the update process.

Conceptually, I'd do the following:

1. Add your records from Little to Big. (Don't worry about the flag for now.)

2. Create a table, IDsInLittle, with one column (ID) and one record for each distinct ID in Little. This will drive the rest of the process.

3. Create a table, MostRecentVisit, with columns for ID and Date, to get the most recent date for each of the patients in IDsInLittle, by joining IDsInLittle with Big. Hopefully the SQL optimizer will only process the records from Big that match the IDs in IDsInLittle, so that this will run quickly.

4. Use an SQL update statement with a CASE construct to set Flag for each record in Big whose ID is in MostRecentVisit to 0 or 1, depending on whether the date in Big matches the one in MostRecentVisit.

HTH!

Mike Rhoads Westat RhoadsM1@Westat.com

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of James, Steve Sent: Tuesday, December 28, 2004 11:09 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Big Table / Little Table Update

SAS-L

We have a problem that I'd like your help on. We have a huge data set which needs to be updated daily with patient records. Each record has a patient ID and date of visit. Thus, there are multiple occurrences of both patient ID and date of visit. What we want to do is to add the incoming records to the existing data and then set a flag to designate which records are from the most recent visit. For example:

Big_Table

ID Date Flag

1 12/1 1

1 12/1 1

1 11/31 0

2 12/1 0

2 12/3 1

etc....

Incoming Table

ID Date

1 12/2

1 12/3

2 12/2

2 12/4

Will result in the following

1 12/1 0

1 12/1 0

1 11/31 0

1 12/2 0

1 12/3 1

2 12/1 0

2 12/3 0

2 12/2 0

2 12/4 1

etc...

The main idea is not to have to re-create the big table each time it's updated since it takes hours to do. Any thoughts on how to do this will be most appreciated.

Steve James

Centers for Disease Control and Prevention

sjames@cdc.gov


Back to: Top of message | Previous page | Main SAS-L page