|
On Mon, 10 Nov 2008 10:48:25 +0000, Tracy Clegg <tracy.clegg@UCD.IE> wrote:
>Thanks to Peter and Bruce for your thoughts on this.
>
>To shed a little more light on what we are trying to do: The measurements
>are hourly concentration of a gas detected on a farm. The only thing it
>would correlate with would be weather patterns which we don't have
>information on. As Peter suggested identifying a 'high cluster' could be
>done by identifying where a number of points occurred in a row above/below a
>given value (eg. 2 standard deviations above the mean). Something like this
>might work, but it would have to somehow allow for low values occurring
>within the cluster. Some periods of time show very high values interspersed
>with very low values. That's the whole problem. We'd like to know if a (eg)
>7 day period had significantly high overall values even though it contained
>low values. We thought of using a temporal sliding window that would move
>through time summing at given time-lengths. Then identify windows of high
>concentrations of the gas in total. The problem is how to do this with
>varying sizes of the window and does anyone know of any software that we can
>use to do this?
Do you have SAS/ETS? If so, look at PROC EXPAND. It's good for computing
moving averages and the like.
>
>Thanks again for your help
>
>Tracy
>
>-----Original Message-----
>From: Peter Flom [mailto:peterflomconsulting@mindspring.com]
>Sent: 06 November 2008 17:35
>To: Tracy Clegg; SAS-L@LISTSERV.UGA.EDU
>Subject: Re: temporal cluster
>
>Tracy Clegg <tracy.clegg@UCD.IE> wrote
>>
>>I have hourly measurements of a continuous variable (concentration of a
>>gas) measured over 5 years. Could anyone tell me how to identify
>>clusters where the continuous variable was unusually high over a certain
>>period of time?
>
>
>I think cluster is the wrong term here, and may put people in mind of
>cluster analysis, which isn't what you want.
>
>Some questions
>
>What do you mean by "unusually high"?
>Does the value change over time, other than randomly, and if so, how?
>How are the values distributed over time?
>Is there autocorrelation?
>
>Some ways that you *might* want to do this:
> Find the mean value over the 5 years. Define 'unusual' as more than 2 sd
>above that.
> Fit a loess curve, define 'unusual' as some distance above the predicted
>value
> Fit fsome other curve, define 'ususual' as some distance above that.
>
>etc.
>
>but you also want 'clusters'. If there is a lot of autocorrelation, then
>identifying one very high point will likely identify others that are almost
>as high.
>
>Or you could say that a 'high cluster' is XXXX points in a row that are XXXX
>above a curve based on XXX
>
>
>If there values, over time, fit some distribution well, then you could find
>some definition of outlier based on the number of points and the nature of
>the distribution.
>
>HTH or at least gives you some ideas
>
>Peter
>
>
>
>Peter L. Flom, PhD
>Statistical Consultant
>www DOT peterflom DOT com
|