Date: Thu, 6 Nov 2008 12:34:53 -0500
Reply-To: Peter Flom <email@example.com>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject: Re: temporal cluster
Content-Type: text/plain; charset=UTF-8
Tracy Clegg <tracy.clegg@UCD.IE> wrote
>I have hourly measurements of a continuous variable (concentration of a
>gas) measured over 5 years. Could anyone tell me how to identify
>clusters where the continuous variable was unusually high over a certain
>period of time?
I think cluster is the wrong term here, and may put people in mind of cluster analysis, which isn't what you want.
What do you mean by "unusually high"?
Does the value change over time, other than randomly, and if so, how?
How are the values distributed over time?
Is there autocorrelation?
Some ways that you *might* want to do this:
Find the mean value over the 5 years. Define 'unusual' as more than 2 sd above that.
Fit a loess curve, define 'unusual' as some distance above the predicted value
Fit fsome other curve, define 'ususual' as some distance above that.
but you also want 'clusters'. If there is a lot of autocorrelation, then identifying one very high point will likely identify others that are almost as high.
Or you could say that a 'high cluster' is XXXX points in a row that are XXXX above a curve based on XXX
If there values, over time, fit some distribution well, then you could find some definition of outlier based on the number of points and the nature of the distribution.
HTH or at least gives you some ideas
Peter L. Flom, PhD
www DOT peterflom DOT com