Date: Thu, 26 Jan 2006 16:05:20 -0600
Reply-To: Jiann-Shiun Huang <Jiann-Shiun.Huang@AMERUS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jiann-Shiun Huang <Jiann-Shiun.Huang@AMERUS.COM>
Subject: Re: Straightforward way to combine multiple values of stdev?
Content-Type: text/plain; charset=US-ASCII
Roy:
Use the following code and your data to calculate mean and standard
deviation. Your date should be input as sample size, sample mean and
sample standard deviation in that order. There is no limit about how
many samples. The result will be shown in log window.
data temp;
input Size Mean StdDev;
cards;
10 100 30
20 95 26
50 97 32
;
run;
data _null_;
set temp end=lastobs;
N+Size;
Sum+Mean*Size;
SS+(Size-1)*StdDev**2+Size*Mean**2;
if lastobs then
do;
GrandMean=Sum/N;
put GrandMean=;
GrandStdDev=((SS-Sum**2/N)/(N-1))**(0.5);
put GrandStdDev=;
end;
run;
J S Huang
1-515-557-3987
fax 1-515-557-2422
>>> "Pardee, Roy" <pardee.r@GHC.ORG> 1/26/2006 3:51:39 PM >>>
Wow. Okay--thanks! So at least I know it's mathematically possible.
So can anybody save me from all this arithmatic & passes through the
dset with a nice "just use PROC UMPTYSQUAT--that does just what you
need"? ;-)
Thanks!
-Roy
-----Original Message-----
From: Jiann-Shiun Huang [mailto:Jiann-Shiun.Huang@amerus.com]
Sent: Thursday, January 26, 2006 1:40 PM
To: Pardee, Roy; SAS-L@LISTSERV.UGA.EDU
Subject: Re: Straightforward way to combine multiple values of stdev?
Roy:
Let's assume that n_1, x_1 and s_1 denotes the sample size, the mean
and the sample standard deviation from the first sample. Similarly
n_2,
x_2 and s_2 for the second sample, ..., and n_i, x_i and s_i for the
ith
sample.
The sample mean of these i samples take three steps:
(1) find the sum: Sum = n_1 * x_1 + n_2 * x_2 + ... + n_i * x_i,
(2) find the sample size: N = n_1 + n_2 + ... + n_i,
(3) find the sample mean: Sum / N.
The sample standard deviation of these i samples take several steps:
(1) find Sum of Squares: SS = [(n_1 - 1) * (s_1^2) + n_1 * (x_1^ 2)] +
[(n_2 - 1) * (s_2^2) + n_2 * (x_2^ 2)] + ... + [(n_i - 1) * (s_i^2) +
n_i * (x_i^ 2)],
(2) find the sample standard deviation of the i samples:
{[SS - (Sum^2)/N]/(N - 1)}^(0.5)
, where Sum and N are from (1) and (2) in calculating the sample mean.
HTH
J S Huang
1-515-557-3987
fax 1-515-557-2422
>>> "Pardee, Roy" <pardee.r@GHC.ORG> 1/26/2006 3:14:31 PM >>>
Hey All,
I'm getting some proc tabulate output from various sites, which will
contain the statistics N Min Max Mean & Std. I'm tasked with
collating
these stats into a single uber-table, which must show what we would
get
if I had access to all the raw data available at the various sites.
Those first four stats are reasonably easy to combine (add the N's,
take
min() of the Min's, use the N's to calculate a weighted mean, etc.)
but
the Std has me stumped.
Is there some proc out there that will take something like this for an
input dset:
Obs MAX MEAN MIN N STD site
1 5 4 3 4 0.8165 A
2 65 34 3 2 43.8406 B
And give me what the STD of the union of the samples would be?
Thanks!
-Roy