|
SQL self join to create a data (actually a view to save some resource),
that for each time point, the new data will have a group of records that
include all records 30 second around it. After that, it's
just a simple proc univariate for trimmed mean:
proc sql;
create view xx as
select a.*, b.time as bt, b.price as bprice
from test a, test b
where a.time -30 < b.time <= a.time +30
order by a.time
;
proc univariate data=xx trimmed=0.1;
by time;
var bprice;
output out=yy mean=mean std=sd;
run;
proc print data=yy;
format time datetime.;
run;
time mean sd
02DEC02:08:10:04 416.250 1.33631
02DEC02:08:10:06 416.300 1.15166
02DEC02:08:10:15 416.364 1.14217
02DEC02:08:10:20 416.292 1.11719
02DEC02:08:10:23 416.346 1.08752
02DEC02:08:10:24 416.346 1.08752
02DEC02:08:10:31 416.346 1.08752
02DEC02:08:10:35 416.192 1.10523
02DEC02:08:10:45 416.550 0.83166
02DEC02:08:10:46 416.455 0.85013
02DEC02:08:10:51 416.350 0.81820
02DEC02:08:11:02 416.188 0.75297
02DEC02:08:11:16 417.800 4.10183
02DEC02:08:11:29 417.333 3.79034
02DEC02:08:11:33 417.417 3.73385
02DEC02:08:11:50 417.800 4.04042
02DEC02:08:11:51 417.800 4.04042
02DEC02:08:12:01 418.125 4.58939
I would be curious to know if the above match what you 'manually'
calculated.
Ya
On Tue, 23 May 2006 13:40:16 -0700, Dorian <d.noel@ISMACENTRE.RDG.AC.UK>
wrote:
>Good evening,
>
>I have a very large dataset and I want to compute a 10% trimmed sample
>mean and standard deviation of a neigbourhood of k observations around
>i. Consider the following:
>
>Consider the following:
>
> data test;
> input time datetime18. price;
>cards;
>02DEC2002:08:10:04 417.0
>02DEC2002:08:10:06 414.0
>02DEC2002:08:10:06 414.5
>02DEC2002:08:10:15 416.5
>02DEC2002:08:10:20 417.5
>02DEC2002:08:10:23 417.5
>02DEC2002:08:10:24 416.0
>02DEC2002:08:10:31 417.0
>02DEC2002:08:10:35 416.5
>02DEC2002:08:10:35 416.5
>02DEC2002:08:10:45 417.0
>02DEC2002:08:10:46 415.5
>02DEC2002:08:10:51 417.0
>02DEC2002:08:11:02 415.0
>02DEC2002:08:11:16 415.5
>02DEC2002:08:11:29 416.5
>02DEC2002:08:11:33 425.0
>02DEC2002:08:11:50 416.0
>02DEC2002:08:11:51 416.0
>02DEC2002:08:12:01 415.5
>;
>run;
>
>Then for each observation in the dateset, I want to compute a 10%
>trimmed mean and std dev of observations that are within 30 seconds
>(preceding and subsequent observations).
>
>Thanks for your assistance.
>
>Dorian
|