LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2007, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 31 Jan 2007 21:56:53 -0500
Reply-To:     "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject:      Re: Number of Observations Read By MODIFY Statement

On Wed, 31 Jan 2007 02:20:18 -0500, Peter Crawford <peter.crawford@BLUEYONDER.CO.UK> wrote:

>in the final points made, Pudding Man says >>Likely goes without saying: for this type of app., I 'spect UPDATE >>or MERGE would be much better choices (disk space permitting). >> >> Prost, > >well, Prost to you too, Pudding but >if you need modify it's probably because you need update-in-place >(with the audit trails, perhaps) >So sometimes it's neccessary to be "slower"

Moreover, the performance of MODIFY with BY can be very *good* if an appropriate index on the master data set is in place.

> >regards >Peter Crawford > > >On Tue, 30 Jan 2007 18:52:44 -0600, Pudding Man <pudding.man@GMAIL.COM> >wrote: > >>On 1/30/07, Howard Schreier <hs AT dc-sug DOT org> <nospam@howles.com> >wrote: >>> On Sun, 28 Jan 2007 23:46:36 GMT, Peetie Wheatstraw >>> <peetie.wheatstraw@GMAIL.COM> wrote: >> >> ... >> >>> >On Sun, 28 Jan 2007 08:55:00 -0500, nospam@HOWLES.COM ("Howard >Schreier <hs >>> AT dc-sug DOT >>> >org>") wrote: >>> > >>> >>Start with a table of 10 million rows >>> >> >>> >> data big; >>> >> do key = 1e7 to 1 by -1; >>> >> output; >>> >> end; >>> >> run; >>> >> >>> >>and a table of 3 rows >>> >> >>> >> data little; >>> >> do key = 616, 61616, 6161616; >>> >> output; >>> >> end; >>> >> run; >>> >> >>> >>Now make a DATA step delete the 3 matching rows from the big table >>> >> >>> >> data big; >>> >> modify big little; >>> >> by key; >>> >> remove; >>> >> run; >>> >> >>> >>The log reports >>> >> >>> >>NOTE: There were 1 observations read from the data set WORK.BIG. >>> >> >>> >>NOTE: The data set WORK.BIG has been updated. There were 0 >observations >>> >>rewritten, 0 observations added and 3 observations deleted. >>> >> >>> >>NOTE: There were 3 observations read from the data set WORK.LITTLE. >>> >> >>> >>NOTE: DATA statement used (Total process time): >>> >> real time 31.26 seconds >>> >> cpu time 31.12 seconds >>> >> >>> >>Obviously it had to read a lot of observations from BIG. So why does >it say >>> >>it read only one? >>> >>> Yes. The form of MODIFY which accepts two data sets and a BY statement >is >>> extremely inefficient in the absence of a suitable index on the master >data set. >> >>So it would appear. This is "Matching Access" per the doc. >> >>> I ran this variation on my test: >>> >>> data big; >>> do key = 1e6 to 1 by -1; >>> output; >>> end; >>> run; >>> >>> data little; >>> do key = 1 to 10; >>> output; >>> end; >>> run; >>> >>> data big; >>> modify big little; >>> by key; >>> remove; >>> run; >>> >>> My understanding is that for each observation in LITTLE, SAS had to >read BIG >>> sequentially, from the beginning, until it finds the match. >>> >>>So, it read >>> >>> 1M + 999,999 + 999,998 . . . + 999,991 >>> >>> or almost 10 million observations. The step took 13.5 seconds. Then I >>> changed the 10 to 20 and ran the whole thing again. So it had to read >>> >>> 1M + 999,999 + 999,998 . . . + 999,981 >>> >>> or almost 20 million. Indeed, the time went up to 27 seconds. >>> >>> Of course in both cases it reported >>> >>> NOTE: There were 1 observations read from the data set WORK.BIG. >> >>My conjecture is that it: >> >>a.) Completely buffers BIG (memory allowing) >>b.) Reads a key from LITTLE >>c.) Searches the buffer until the key if found >>d.) Go to a.) until LITTLE is fully read >> >>"Task Mngr." wouldn't confirm full buffering, but the timings >>were the same with/without SASFILE BIG. >> >>If LITTLE stores the *first* rather than the *last* few recs, it >whizzbangs >>(W2k/sas 913): >> >>NOTE: The data set WORK.BIG1 has 1000000 observations and 1 variables. >> ... >>223 data little; >>224 >>225 do key = 999991 to 1000000; >>226 output; >>227 end; >>228 run; >> >>NOTE: The data set WORK.LITTLE has 10 observations and 1 variables. >>NOTE: DATA statement used (Total process time): >> real time 0.00 seconds >> user cpu time 0.00 seconds >> system cpu time 0.01 seconds >> Memory 144k >> >> >>229 >>230 >>231 data big1; >>232 modify big1 little end = end1; >>233 by key; >>234 _ct + 1; >>235 if end1 then put _ct=; >>236 remove; >>237 run; >> >>_ct=10 >>NOTE: There were 1 observations read from the data set WORK.BIG1. >>NOTE: The data set WORK.BIG1 has been updated. There were 0 >>observations rewritten, 0 >> observations added and 10 observations deleted. >>NOTE: There were 10 observations read from the data set WORK.LITTLE. >>NOTE: DATA statement used (Total process time): >> real time 0.00 seconds >> user cpu time 0.01 seconds >> system cpu time 0.00 seconds >> Memory 276k >> >>> It seems to me that SAS shjould do one of two things: (1) maintain an >actual >>> count and report that (I can't see why that's difficult) >> >>Concur. >> >>or (2) suppress the >>> note altogether. >> >>Wouldn't be compatible with SET, MERGE, UPDATE and >>MODIFY("Sequential Access"). One can't even discern the final obs >>count from the step note. >> >>The doc for "Matching Access" could also use a good overhaul. >> >>Compared to a sequential MERGE, Howards MODIFY code took about 5 >>times more cpu cycles. >> >>Likely goes without saying: for this type of app., I 'spect UPDATE >>or MERGE would be much better choices (disk space permitting). >> >> Prost, >> Puddin' >> >>***************************************************************** >>*** Puddin' Man PuddingDotMan at GmailDotCom ** >>*****************************************************************; >> >>"Law without Equity is no law at all: 'tis a form of jungle rule, >> and is always supported primarily by force of arms."


Back to: Top of message | Previous page | Main SAS-L page