Date: Sat, 26 Jul 1997 17:36:48 +1000
Reply-To: Tim CHURCHES <TCHUR@DOH.HEALTH.NSW.GOV.AU>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Tim CHURCHES <TCHUR@DOH.HEALTH.NSW.GOV.AU>
Subject: Re: PC vs. Mainframe (Performance)
Content-Type: text/plain
Michelle L Oyen wrote:
>
> I've searched with a variety of Search Engines, DejaNews, and SAS sites but
> couldn't find any references to how a PC performs compared to a Mainframe.
> I realize the answer may be very dependent upon the types of processes that
> are being run and the types of hardware, etc. but are their any general rules
> (such as the PC will never perform as well as a mainframe for large sets of
> data (define large), etc? or that the PC can perform as well as the
mainframe,
> etc....)
>
> Assuming you could set up an intel PC with any Windows OS and whatever
> hardware is required would it be possible to obtain similar* performance?
>
> 'Similar' in that the programs take no more than 10 times as long (i.e. a
> 3 minute mainframe time vs. a max of 30 min PC time).
>
> FYI: The datasets the SAS program works on range in size from a few MB to
> ~600 MB (and maybe very rarely over a GB).
Historically mainframes have had superior performance because of much faster
disc
input/output (I/O), lots of memory (so that data could be cached or manipulated
without having
to access any disc drives) and faster CPUs (albeit shared with many users).
Whether these differences still apply depends on what sort of mainframe you have
available. If it
is a modern one (no more than a few years old) and has been upgraded, then the
CPU may be
a few times faster than a Pentium Pro PC and it may have many gigabytes of
memory.
However the big difference will be in the disc drives, of which there will be
many, each with its
own disc controller and I/O path. Most mainframes usually come with a full-time
administrator
who usually spends quite a lot of time working out how to distribute data and
workspaces
across the disc drives to maximise I/O performance.
However, PCs are beginning to rival these features. The CPU speed is not far
behind most
mainframes and you can have two or four CPUs in your PC if you wish.. Server
class PCs can
be fitted with lots of memory (1, 2 or even 4 gigabytes is possible and 256 or
512 megabytes is
now quite affordable even for a single user workstation) and operating systems
such as
Windows NT or SCO UnixWare will use all that memory to good advantage. However,
care
needs to be taken in configuring the PC to ensure that its disc I/O performance
is as good as
possible, because this is where it will still trail the mainframe. The secret is
to use Fast-Wide
SCSI disc drives which run at 7200 or 10,000 rpm, with each disc attached to its
own SCSI
channel. Arrange your SAS data libraries so that you are always reading data
from one disc
and writing it to another. Avoid using the RAID-5 arrays available on most PCs
if you want
maximum performance since they tend to attach all the discs in the array to a
single SCSI
channel (there are some high-end exceptions which use multiple SCSI channels for
each RAID
array but they cost a lot more).
As an example, we have a dual Pentium Pro server with 512 megabytes of memory
configured
as I have descibed. Creating a 20% subset of a 700 megabyte SAS dataset (with 2
million
observations) takes no more than 60 seconds. Performing summarisation or other
analysis on
the data as it is subsetted hardly slows things down since the CPU is still
spending a lot of
time waiting for data from the disc. Smaller datasets of about 25 megabytes and
100,000
observations take about 10 seconds to subset or summarise the first time and
typically 2 or 3
seconds thereafter because the entire dataset is cached in memory. Of course, it
may be
slower if the machine is shared with other users, but since a machine like this
can be built for
about $20,000, you don't need to share it with too many other people to justify
the cost (unlike
a mainframe).
Note that the same subsetting task takes about 2.5 to 3 times longer when the
data is being
both read from and written to the same disc drive. This is the bottleneck in
since most PCs
since they are configured with a single disc drive. Note that adding a second
IDE disc to a PC
does not help much - you really need multiple Ultra-Wide SCSI discs on
independent SCSI
channels (although multiple Fast ATA interface discs might be nearly as fast and
a bit
cheaper). The speed of the disc is important too: most PC disc drives operate at
4500 or 5400
rpm and can't deliver sequentially read data (as is typical when using SAS) as
fast as the disc
interface can handle. The more expensive 7200 or 10,000 rpm discs are definitely
worth it.
An alternative to a fast PC would be a Unix workstation, but the consensus seems
to be that
you have to pay about twice as much to get the same level of performance as a
PC, provided
that you configure the PC correctly for use with SAS (as described above). The
cost of SAS
licenses is also a consideration: a single user license for SAS running under
Windows NT
WorkStation on a machine with no more than two CPUs is almost affordable,
whereas the cost
of SAS for most Unix workstations and mainframes always makes me cringe...
Don't forget to consider the cost of setting up and maintaining a PC. With a
mainframe,
someone else does this for you and all these costs are hidden. It can take quite
a lot of time to
administer a PC server if there are more than a few people using it. Security
(both logical and
physical) needs to be considered - Windows NT can be made very secure but it
takes effort.
The nice thing about a mainframe is that if something goes wrong it is someone
else's problem
and you can just grumble, whereas with a PC, if something goes wrong it is
usually your
problem...
Hope this helps,
Tim Churches
|