Date: Tue, 23 May 2006 20:11:18 -0400
Reply-To: Ya Huang <ya.huang@AMYLIN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ya Huang <ya.huang@AMYLIN.COM>
Subject: Re: Trimmed Sample Mean and Std Dev
Agree! That's why I use view instead of creating a new table. I know
it won't help a lot, I was hoping the 'Where' clause could help to
speed up some.
On Tue, 23 May 2006 20:00:50 -0400, Howard Schreier <hs AT dc-sug DOT org>
<nospam@HOWLES.COM> wrote:
>On Tue, 23 May 2006 17:45:57 -0400, Ya Huang <ya.huang@AMYLIN.COM> wrote:
>
>>SQL self join to create a data (actually a view to save some resource),
>>that for each time point, the new data will have a group of records that
>>include all records 30 second around it. After that, it's
>>just a simple proc univariate for trimmed mean:
>>
>>proc sql;
>>create view xx as
>>select a.*, b.time as bt, b.price as bprice
>>from test a, test b
>>where a.time -30 < b.time <= a.time +30
>>order by a.time
>>;
>
>But that does a full cartesian and Dorian said it was a "very large
>dataset", so maybe it requires something more.
>
>>
>>proc univariate data=xx trimmed=0.1;
>>by time;
>>var bprice;
>>output out=yy mean=mean std=sd;
>>run;
>>
>>proc print data=yy;
>>format time datetime.;
>>run;
>>
>> time mean sd
>>
>> 02DEC02:08:10:04 416.250 1.33631
>> 02DEC02:08:10:06 416.300 1.15166
>> 02DEC02:08:10:15 416.364 1.14217
>> 02DEC02:08:10:20 416.292 1.11719
>> 02DEC02:08:10:23 416.346 1.08752
>> 02DEC02:08:10:24 416.346 1.08752
>> 02DEC02:08:10:31 416.346 1.08752
>> 02DEC02:08:10:35 416.192 1.10523
>> 02DEC02:08:10:45 416.550 0.83166
>> 02DEC02:08:10:46 416.455 0.85013
>> 02DEC02:08:10:51 416.350 0.81820
>> 02DEC02:08:11:02 416.188 0.75297
>> 02DEC02:08:11:16 417.800 4.10183
>> 02DEC02:08:11:29 417.333 3.79034
>> 02DEC02:08:11:33 417.417 3.73385
>> 02DEC02:08:11:50 417.800 4.04042
>> 02DEC02:08:11:51 417.800 4.04042
>> 02DEC02:08:12:01 418.125 4.58939
>>
>>I would be curious to know if the above match what you 'manually'
>>calculated.
>>
>>Ya
>>
>>
>>On Tue, 23 May 2006 13:40:16 -0700, Dorian <d.noel@ISMACENTRE.RDG.AC.UK>
>>wrote:
>>
>>>Good evening,
>>>
>>>I have a very large dataset and I want to compute a 10% trimmed sample
>>>mean and standard deviation of a neigbourhood of k observations around
>>>i. Consider the following:
>>>
>>>Consider the following:
>>>
>>> data test;
>>> input time datetime18. price;
>>>cards;
>>>02DEC2002:08:10:04 417.0
>>>02DEC2002:08:10:06 414.0
>>>02DEC2002:08:10:06 414.5
>>>02DEC2002:08:10:15 416.5
>>>02DEC2002:08:10:20 417.5
>>>02DEC2002:08:10:23 417.5
>>>02DEC2002:08:10:24 416.0
>>>02DEC2002:08:10:31 417.0
>>>02DEC2002:08:10:35 416.5
>>>02DEC2002:08:10:35 416.5
>>>02DEC2002:08:10:45 417.0
>>>02DEC2002:08:10:46 415.5
>>>02DEC2002:08:10:51 417.0
>>>02DEC2002:08:11:02 415.0
>>>02DEC2002:08:11:16 415.5
>>>02DEC2002:08:11:29 416.5
>>>02DEC2002:08:11:33 425.0
>>>02DEC2002:08:11:50 416.0
>>>02DEC2002:08:11:51 416.0
>>>02DEC2002:08:12:01 415.5
>>>;
>>>run;
>>>
>>>Then for each observation in the dateset, I want to compute a 10%
>>>trimmed mean and std dev of observations that are within 30 seconds
>>>(preceding and subsequent observations).
>>>
>>>Thanks for your assistance.
>>>
>>>Dorian
|