Date: Fri, 7 Mar 1997 22:33:51 GMT
Reply-To: "Christopher W. Donald" <donaldc@VOICENET.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Christopher W. Donald" <donaldc@VOICENET.COM>
Organization: Voicenet - Internet Access - (215)674-9290
Subject: Re: Time taken for merging large files
Melvin Klassen (KLASSEN@UVVM.UVIC.CA) wrote:
: Libby Starling & Mustapha Hammida <ESTARLIN@NGWMAIL.DES.STATE.MN.US> write:
: >We're using Windows 3.1 on a Pentium 166 MHz processor with SAS 6.11,
: >and before we occupy our computer for the entire day, we'd like a time
: >estimate of how long it takes to merge 2 files of 2.6 million records with
: >about 10 variables each. The output file is to have 5 variables at most.
: >Can anyone offer an estimate of how long this will take?
: >Please reply directly rather than to the list. Thanks!
: Just to tell the other readers of SAS-L that at least one reply to your
: question has been given, I'll do both.
: What's the slowest part of your computer?
: Probably, it's the disk-drive.
: So, the slowest part becomes the "bottle-neck"
: which determines the rate at which the job runs.
: How much data do you have to read:
: 2 (files) * 2,600,000 (records) * 10 (variables) * 8 (bytes per variable)
: == 396 MB
: My guess is that your disk is capable of "sustained throughput"
: of 2MB to 3MB per second.
: So, in a "perfect" world, to read the files, it will take about 200 seconds,
: and writing a file about 25% as big will take another 50 seconds.
: However, is has not been a perfect world since the time when Adam & Eve
: were banished from the Garden of Eden, so this estimate is too optimistic.
: Because you are "merging" the two files, SAS will (generally)
: "read-one-record-from-file-one" and then "read-one-record-from-file-two".
: It's quite possible that each "read" will move the disk's I/O head
: to the "least-optimal" position for reading one record from the "other" file,
: especially if the two files were written at different times,
: rather than written in an "interleaved" manner by a previous SAS program.
: This means that reading the dataset could take much longer, say 10 times,
: because it takes a measurable amount of time, e.g., 12 milli-seconds,
: to make the disk's I/O head "seek" to the right location.
: Also, given that SAS does "read-a-record" then "write-a-record",
: it's quite possible that each "write" will also move the R/W head of the disk
: into the "least-optimal" position for the next "read",
: even if the alternating "read" commands haven't already had this effect.
: So, multiply the '250' seconds by 10 (or 20), to get a better estimate.
: P.S. Given that you're running Windows, it should be possible to
: i.e., use your computer for some other "light" duties, while the SAS program
: is running.
Actually with Windows 3.1, running another programme will slow things
down because it will take away "time slices" from the processor. You
really would only be able to continue to do meaningful tasks under a
system like UNIX or NT.