LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 4 Jan 2005 15:10:05 -0500
Reply-To:     "James, Steve" <SPJ1@CDC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "James, Steve" <SPJ1@CDC.GOV>
Subject:      Re: Big Table / Little Table Update
Comments: To: Mike Rhoads <RHOADSM1@WESTAT.com>
Content-Type: text/plain; charset="us-ascii"

Mike,

The master data set has about 42 million records in it. I don't know how representative the numbers from this run are since it's just test code and the transaction table is actually created from the master (see log below).

150 records per patient sounds high to me. As I mentioned before I don't know the data that well. I'll have to find out more. I'm trying to run the program again and make some changes to bufsize and ibufsize that Michael Raithel suggested. It'll be interesting to see if it makes a difference.

Steve

8 data Master (index=(key)) Trans ; 9 set j.vaDetailsPre04; 10 if datepart(storedate) ge mdy(12,31,2004) then output Trans ; 11 else output Master ; 12 run;

NOTE: There were 42179057 observations read from the data set J.VADETAILSPRE04. NOTE: The data set WORK.MASTER has 42156526 observations and 29 variables. NOTE: The data set WORK.TRANS has 22531 observations and 29 variables. NOTE: DATA statement used (Total process time): real time 1:06:05.36 cpu time 17:24.85

-----Original Message----- From: Mike Rhoads [mailto:RHOADSM1@WESTAT.com] Sent: Monday, January 03, 2005 5:39 PM To: James, Steve; SAS-L@LISTSERV.UGA.EDU Subject: RE: Big Table / Little Table Update

Steve,

Glad you're making (some) progress! A few thoughts / suggestions / questions follow.

While you're still testing things out, I might use a separate PROC SQL step for each PROC SQL statement. The extra overhead should be minimal, and that would give you a better idea of how each statement is performing time-wise. You can remove the extra PROC SQLs when / if you put the code into production.

If I'm understanding the log, you have 22531 transactions for 5323 patients. Your final DATA step reports that 780,472 records were updated, which implies that your master file averages around 150 records per patient -- does that sound correct?

Do you know about how many patients, and records, are in your master? It would be interesting to know approximately what percentage get updated each day.

Mike Rhoads Westat RhoadsM1@Westat.com

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of James, Steve Sent: Monday, January 03, 2005 5:00 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Big Table / Little Table Update

Thanks for the helpful tips last week about my problem concerning updating a big table with duplicate key values from a much smaller transaction table. Several people gave me some good advice as well as code. I'm testing a solution from Mike Rhoads (see log below) and I was able to get the CPU time down to under 4 minutes (from 42 minutes). However the clock time remains quite long. For the run today it was over 2 hours just for the update. When I ran it last week (after most people had left for the holiday), it ran in about 45 minutes of clock time with about the same CPU time as before. I can't understand why there's such a discrepancy: 4 minutes vs 2 hours? Is it I/O?

How can I convince my co-worker that while his solution takes 42 minutes of CPU time but roughly the same amount of real time that he should switch methods to the one taking less CPU time?

We're running on a Sunfire 6800 w/ 8 processors running SAS 9.1 under Solaris 64 bit.

Steve James

Centers for Disease Control and Prevention

sjames@cdc.gov

21 /*---------------------------------------------------------------------- ---

22 Append Trans data to Master.

23 ------------------------------------------------------------------------ -*/

24 proc sql;

25 insert into Master select * from Trans

26 ;

NOTE: 22531 rows were inserted into WORK.MASTER.

27

28 /*---------------------------------------------------------------------- ---

29 Create a table containing the Keys that are in Master that also are in

30 Trans. Since you have already appended Trans to Master, there should be

31 no IDs in Trans that are not in Master.

32 ------------------------------------------------------------------------ -*/

33 create table KeysInTrans as

34 select key, max(storedate) as date

35 from Trans

36 group by key

37 ;

NOTE: Table WORK.KEYSINTRANS created, with 5323 rows and 2 columns.

38

39 /*---------------------------------------------------------------------- --

40 Get the most recent visit date for each Key.

41 ------------------------------------------------------------------------ --*/

42 CREATE TABLE MostRecentVisit AS

43 SELECT T.Key, MAX(StoreDate) AS Date

44 FROM KeysInTrans AS T, Master as M

45 where T.Key = M.Key

46 GROUP BY T.Key ;

NOTE: Table WORK.MOSTRECENTVISIT created, with 5323 rows and 2 columns.

47

48 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 1:39:35.63

cpu time 4:29.72

49

50 /*---------------------------------------------------------------------- ---

51 Now read in record in Trans and update the corresponding records in Master.

52 ------------------------------------------------------------------------ -*/

3 The SAS System 11:10 Monday, January 3, 2005

53

54 data Master ;

55 set MostRecentVisit end=lastob ;

56 do until (_iorc_ = %sysrc(_dsenom));

57 modify Master key=key ;

58 select (_iorc_);

59 when (%sysrc(_sok)) do;

60 if storedate eq date

61 then cdc_avrflag=1 ;

62 else cdc_avrflag=0 ;

63 replace Master ;

64 records_updated + 1 ;

65 end; /* end of where sysrc=_sok */

66 when (%sysrc(_dsenom)) do;

67 _error_ = 0;

68 end ;

69 otherwise;

70 end; /* end of select (_iorc_) statement */

71 end; /* end of do until () statement */

72 if lastob then put 'the number of records updated were: ' records_updated comma9. ;

73 run;

the number of records updated were: 780,472

NOTE: There were 5323 observations read from the data set WORK.MOSTRECENTVISIT.

NOTE: The data set WORK.MASTER has been updated. There were 780472 observations rewritten, 0 observations added and 0 observations

deleted.

NOTE: DATA statement used (Total process time):

real time 2:21:58.26

cpu time 3:52.51


Back to: Top of message | Previous page | Main SAS-L page