| Date: | Wed, 16 Aug 2006 16:38:18 -0400 |
| Reply-To: | Carl Kyonka <Carl.Kyonka@ENBRIDGE.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Carl Kyonka <Carl.Kyonka@ENBRIDGE.COM> |
| Subject: | Sum-able Summaries |
| Content-Type: | text/plain; charset="US-ASCII" |
|---|
I have some fairly large datasets of computer performance information
(20+GB). Much of the data is collected at 5 minute intervals. I think I
need this level of detail for one or two months into the past, but further
than that, it would be better to have a suitable summary of the data. But
how do I effectively (efficiently?) summarize this data? The goal here
would be to keep the long-term data small.
It seems to me that for each measure, a summary might include:
Sum of all observations
Count of all observations
Min
Max
Sum of squares
For example, if the C: drive of a Windows server is measured for its %
disk active time, and this is done every five minutes, one summary might
be over 8 hour intervals. So 96 observations (60 min/5 min * 8 hours = 96
observations) would be collapsed into one summary with six numerical
variables and some number of CLASS variables (server name, disk name,
datetime span, monitoring frequency, Windows object, Windows counter and
instance).
One other aim in this summarization is to be sum-able. That is, it should
be possible to further summarize the summary records into even longer
timespans or other aggregates based on the CLASS variables.
I'm sure this has been done in some contexts (MXG, MICS, cubes, etc.), but
I am not aware of discussions which stats to use in the summary. Does
anyone know of such a discussion or have experience in generating them?
Carl Kyonka
Capacity & Performance
Enbridge
416 495 5076
|