| Date: | Sun, 30 Aug 2009 21:19:19 -0700 |
| Reply-To: | Dale McLerran <stringplayer_2@YAHOO.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Dale McLerran <stringplayer_2@YAHOO.COM> |
| Subject: | Re: Fastest Steps for Simulating: Anderson-Darling Goodness of
Fit test for Non-typical distn |
| In-Reply-To: | <6eca73440908302014p298b9358l52ef2e829617d858@mail.gmail.com> |
| Content-Type: | text/plain; charset=iso-8859-1 |
One million times? Why? I really think that is overkill.
I would try to cover more parameter combinations if it were
me.
But you should be able to use a single data step to generate
A-D statistics for all of your parameter combinations. The
code below should be pretty efficient.
%macro AD(N=);
do i=1 to &N;
/* The next line needs completion with the appropriate G */
_x&N{i} = G(ranuni(6923479,S));
end;
call sortn(of _X&N(*));
mu = mean(of x1-x&N);
var = var(of x1-x&N);
sd = sqrt(var);
S=0;
do i=1 to &N;
S + ((2*i - 1)/&N) * (log(cdf('normal',x{i},mu,sd)) +
log(1 - cdf('normal',x{&N+1-i},mu,sd)));
end;
AD = -&N - S;
output AD_&N;
%mend;
/* Generate 10000 samples of same size (N=9 in this case) following */
/* a normal distribution and compute AD statistic for each sample. */
data AD_50
AD_100
AD_200
AD_300;
array _x50 {50} x1-x50;
array _x100 {100} x1-x100;
array _x200 {200} x1-x200;
array _X300 {300} x1-x300;
do S={S1 S2 S3}; /* This line needs correct specification */
do rep=1 to 10000;
%AD(N=50)
%AD(N=100)
%AD(N=200)
%AD(N=300)
end;
end;
keep S AD;
run;
/* Determine probability of observed data */
/* using simulated data AD distribution. */
proc sort data=AD_50;
by S AD;
run;
proc sort data=AD_100;
by S AD;
run;
proc sort data=AD_200;
by S AD;
run;
proc sort data=AD_300;
by S AD;
run;
The above is untested code and should be tested with a
small number of replicates before using it for a final
simulation. Also, there will obviously need to be some
final step where you determine the quantiles of the AD
statistics.
Dale
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
--- On Sun, 8/30/09, OR Stats <stats112@GMAIL.COM> wrote:
> From: OR Stats <stats112@GMAIL.COM>
> Subject: Fastest Steps for Simulating: Anderson-Darling Goodness of Fit test for Non-typical distn
> To: SAS-L@LISTSERV.UGA.EDU
> Date: Sunday, August 30, 2009, 8:14 PM
> This is good. I am ready now to run a large scale simulation. What that
> means is that I want to compute the goodness of fit statistic for (M x
> S) groups and n times each group.
>
> Group defined by (m,s); S = s1 s2 s3 and M = 50 100 200 300. Basically,
> M is my different sample sizes for which I am testing their fit to
> function G(random#,s) (i.e., inverse distribution). I would like to run
> each group 1 million times. For each s group, by generating random
> numbers just by 300 x 1million times, I'll have enough simulated data
> y(s) to use for the largest and smaller sample sizes.
>
> My final column space would look like
> i ranuni y_s1=G(ranuni,s1) y_s2=G(ranuni,s2) y_s3=G(ranuni,s3)
> 1
> .
> .
> .
> m
> All rows in the above table would be used to caculate function f_s1,
> f_s2, f_s3 (i.e., AD). This last step is repeated 1 Million times.
>
> Can we do this in one to two DATA STEPS? Which syntax would be fastest
> since we have to generate 300 Million random numbers, from which we would
> split the sample by 1 Million disjoint sets that we would then compute a
> statistic 1 Million times using 50, 100, 200, and 300 rows of data at
> each iteration for three different values of s (s1, s2, s3)?
>
> Thank Q!
|