LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2009, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sun, 6 Sep 2009 03:43:33 -0500
Reply-To:   OR Stats <stats112@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   OR Stats <stats112@GMAIL.COM>
Subject:   Re: Fastest Steps for Simulating: Anderson-Darling Goodness of Fit test for Non-typical distn
Comments:   To: Dale McLerran <stringplayer_2@yahoo.com>
In-Reply-To:   <6eca73440909050418g10431f71g3541716ddd32a4bf@mail.gmail.com>
Content-Type:   text/plain; charset=ISO-8859-1

More literally,

NOTE: Argument 4 to function CDF at line 3 column 5 is invalid. is referring to the function provided

log(cdf('normal',_x&N{i},mu,sd)

The output tables that inserts -1 * sample size for the column of AD or answer to

AD = -&N - S;

is i n c o r r e c t, where S is probably set to zero b.c. 'argument 4' of CDF is invalid.

On Sat, Sep 5, 2009 at 6:18 AM, OR Stats <stats112@gmail.com> wrote:

> It now creates the datasets. But the S column is just all zeros and AD > column is all -samplesize (i.e., -50, -100, -200 etc.) > > the error log now is > > NOTE: Argument 4 to function CDF at line 3 column 5 is invalid. > > NOTE: Argument 4 to function CDF at line 3 column 52 is invalid. > > On Sat, Sep 5, 2009 at 12:39 AM, Dale McLerran <stringplayer_2@yahoo.com > > wrote: > >> My mistake. There was a legacy reference to array X >> from when you had asked first asked how to compute the >> A-D test for a distribution which you wish to specify. >> We now have four different arrays of various lengths. >> The macro should reference the array of the length >> currently being simulated. In order to reference the >> correct array, replace the code >> >> S + ((2*i - 1)/&N) * (log(cdf('normal',x{i},mu,sd)) + >> log(1 - cdf('normal',x{&N+1-i},mu,sd))); >> >> with >> >> S + ((2*i - 1)/&N) * (log(cdf('normal',_x&N{i},mu,sd)) + >> log(1 - cdf('normal',_x&N{&N+1-i},mu,sd))); >> >> Dale >> >> --------------------------------------- >> Dale McLerran >> Fred Hutchinson Cancer Research Center >> mailto: dmclerra@NO_SPAMfhcrc.org >> Ph: (206) 667-2926 >> Fax: (206) 667-5977 >> --------------------------------------- >> >> >> --- On Fri, 9/4/09, OR Stats <stats112@GMAIL.COM> wrote: >> >> > From: OR Stats <stats112@GMAIL.COM> >> > Subject: Re: Fastest Steps for Simulating: Anderson-Darling Goodness of >> Fit test for Non-typical distn >> > To: SAS-L@LISTSERV.UGA.EDU >> > Date: Friday, September 4, 2009, 7:34 PM >> > cool, good. The undeclared >> > array is still giving problems >> > >> > ERROR: Undeclared array referenced: x. >> > >> > ERROR: Variable x has not been declared as an array. >> > >> > ERROR: Undeclared array referenced: x. >> > >> > ERROR: Variable x has not been declared as an array. >> > >> > 1218 %AD(N=100) >> > >> > >> > On Fri, Sep 4, 2009 at 9:28 PM, Data _null_; <iebupdte@gmail.com> >> > wrote: >> > >> > > That is incorrect syntax for an iterative DO. >> > You need. >> > > >> > > do s=5,5.2,5.4; >> > > >> > > On 9/4/09, OR Stats <stats112@gmail.com> >> > wrote: >> > > > Hmm... still same error >> > > > 1124 do S=[5 5.2 5.4]; /* This line needs correct >> > specification */ >> > > > - >> > > > 386 >> > > > - >> > > > 200 >> > > > >> > > > ERROR 386-185: Expecting an arithmetic >> > expression. >> > > > >> > > > ERROR 200-322: The symbol is not recognized and >> > will be ignored. >> > > > >> > > > On Fri, Sep 4, 2009 at 9:15 PM, OR Stats <stats112@gmail.com> >> > wrote: >> > > > >> > > > > Ok. Too much coding on a Friday! >> > Thx!! >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > On Fri, Sep 4, 2009 at 9:13 PM, Data _null_; >> > <iebupdte@gmail.com> >> > > wrote: >> > > > > >> > > > > > From your original post.... >> > > > > > >> > > > > > >> > > > > > > 1 Million times using 50, 100, >> > 200, and 300 rows of data at each >> > > > > > > iteration for three different >> > values of s (s1, s2, s3)? >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > On 9/4/09, OR Stats <stats112@gmail.com> >> > wrote: >> > > > > > > Not sure what S1 S2 and S3 are >> > referring to? >> > > > > > > >> > > > > > > >> > > > > > > On Fri, Sep 4, 2009 at 8:56 PM, >> > Data _null_; <iebupdte@gmail.com> >> > > > wrote: >> > > > > > > > Did you notice this >> > comment... >> > > > > > > > >> > > > > > > > >> > > > > > > > /* This line needs >> > correct specification */ >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > On 9/4/09, OR Stats <stats112@gmail.com> >> > wrote: >> > > > > > > > > I am getting the >> > following error msg's >> > > > > > > > > >> > > > > > > > > do S={S1 S2 S3}; /* This >> > line needs correct specification */ >> > > > > > > > > >> > > > > > > > > >> > - >> > > > > > > > > >> > > > > > > > > >> > 386 >> > > > > > > > > >> > > > > > > > > >> > 76 >> > > > > > > > > >> > > > > > > > > >> > -- >> > > > > > > > > >> > > > > > > > > >> > 202 >> > > > > > > > > >> > > > > > > > > ERROR 386-185: Expecting >> > an arithmetic expression. >> > > > > > > > > >> > > > > > > > > ERROR 76-322: Syntax >> > error, statement will be ignored. >> > > > > > > > > >> > > > > > > > > ERROR 202-322: The >> > option or parameter is not recognized and >> > > will >> > > > be >> > > > > > > > > ignored. >> > > > > > > > > ERROR: Undeclared >> > array referenced: x. >> > > > > > > > > >> > > > > > > > > ERROR: Variable x has >> > not been declared as an array. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > And what is S for as the >> > 2nd statement of ranuni(p,S)? >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > On Sun, Aug 30, 2009 at >> > 11:19 PM, Dale McLerran >> > > > > > > <stringplayer_2@yahoo.com>wrote: >> > > > > > > > > >> > > > > > > > > > One million >> > times? Why? I really think that is overkill. >> > > > > > > > > > I would try to >> > cover more parameter combinations if it were >> > > > > > > > > > me. >> > > > > > > > > > >> > > > > > > > > > But you should be >> > able to use a single data step to generate >> > > > > > > > > > A-D statistics for >> > all of your parameter combinations. The >> > > > > > > > > > code below should >> > be pretty efficient. >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > %macro AD(N=); >> > > > > > > > > > do i=1 to >> > &N; >> > > > > > > > > > >> > /* The next line needs completion with the >> > appropriate G >> > > */ >> > > > > > > > > > >> > _x&N{i} = G(ranuni(6923479,S)); >> > > > > > > > > > end; >> > > > > > > > > > >> > > > > > > > > > call sortn(of >> > _X&N(*)); >> > > > > > > > > > mu = mean(of >> > x1-x&N); >> > > > > > > > > > var = var(of >> > x1-x&N); >> > > > > > > > > > sd = >> > sqrt(var); >> > > > > > > > > > S=0; >> > > > > > > > > > do i=1 to >> > &N; >> > > > > > > > > > S + >> > ((2*i - 1)/&N) * (log(cdf('normal',x{i},mu,sd)) + >> > > > > > > > > > >> > log(1 - >> > > > cdf('normal',x{&N+1-i},mu,sd))); >> > > > > > > > > > end; >> > > > > > > > > > AD = -&N >> > - S; >> > > > > > > > > > output >> > AD_&N; >> > > > > > > > > > %mend; >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > /* Generate 10000 >> > samples of same size (N=9 in this case) >> > > > following */ >> > > > > > > > > > /* a normal >> > distribution and compute AD statistic for each >> > > > sample. */ >> > > > > > > > > > data AD_50 >> > > > > > > > > > >> > AD_100 >> > > > > > > > > > >> > AD_200 >> > > > > > > > > > >> > AD_300; >> > > > > > > > > > array _x50 >> > {50} x1-x50; >> > > > > > > > > > array _x100 >> > {100} x1-x100; >> > > > > > > > > > array _x200 >> > {200} x1-x200; >> > > > > > > > > > array _X300 >> > {300} x1-x300; >> > > > > > > > > > do S={S1 S2 >> > S3}; /* This line >> > needs correct >> > > > specification */ >> > > > > > > > > > do >> > rep=1 to 10000; >> > > > > > > > > > >> > %AD(N=50) >> > > > > > > > > > >> > %AD(N=100) >> > > > > > > > > > >> > %AD(N=200) >> > > > > > > > > > >> > %AD(N=300) >> > > > > > > > > > end; >> > > > > > > > > > end; >> > > > > > > > > > keep S AD; >> > > > > > > > > > run; >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > /* Determine >> > probability of observed data */ >> > > > > > > > > > /* using simulated >> > data AD distribution. */ >> > > > > > > > > > proc sort >> > data=AD_50; >> > > > > > > > > > by S AD; >> > > > > > > > > > run; >> > > > > > > > > > >> > > > > > > > > > proc sort >> > data=AD_100; >> > > > > > > > > > by S AD; >> > > > > > > > > > run; >> > > > > > > > > > >> > > > > > > > > > proc sort >> > data=AD_200; >> > > > > > > > > > by S AD; >> > > > > > > > > > run; >> > > > > > > > > > >> > > > > > > > > > proc sort >> > data=AD_300; >> > > > > > > > > > by S AD; >> > > > > > > > > > run; >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > The above is >> > untested code and should be tested with a >> > > > > > > > > > small number of >> > replicates before using it for a final >> > > > > > > > > > simulation. >> > Also, there will obviously need to be some >> > > > > > > > > > final step where >> > you determine the quantiles of the AD >> > > > > > > > > > statistics. >> > > > > > > > > > >> > > > > > > > > > Dale >> > > > > > > > > > >> > > > > > > > > > >> > --------------------------------------- >> > > > > > > > > > Dale McLerran >> > > > > > > > > > Fred Hutchinson >> > Cancer Research Center >> > > > > > > > > > mailto: dmclerra@NO_SPAMfhcrc.org >> > > > > > > > > > Ph: (206) >> > 667-2926 >> > > > > > > > > > Fax: (206) >> > 667-5977 >> > > > > > > > > > >> > --------------------------------------- >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > --- On Sun, >> > 8/30/09, OR Stats <stats112@GMAIL.COM> >> > wrote: >> > > > > > > > > > >> > > > > > > > > > > From: OR Stats >> > <stats112@GMAIL.COM> >> > > > > > > > > > > Subject: >> > Fastest Steps for Simulating: Anderson-Darling >> > > > Goodness of >> > > > > > > Fit >> > > > > > > > > > test for >> > Non-typical distn >> > > > > > > > > > > To: SAS-L@LISTSERV.UGA.EDU >> > > > > > > > > > > Date: Sunday, >> > August 30, 2009, 8:14 PM >> > > > > > > > > > > This is >> > good. I am ready now to run a large scale >> > > > simulation. >> > > > > > > What >> > > > > > > > > > that >> > > > > > > > > > > means is that >> > I want to compute the goodness of fit >> > > statistic >> > > > for (M >> > > > > > > x >> > > > > > > > > > > S) groups and >> > n times each group. >> > > > > > > > > > > >> > > > > > > > > > > Group defined >> > by (m,s); S = s1 s2 s3 and M = 50 100 200 >> > > 300. >> > > > > > > Basically, >> > > > > > > > > > > M is my >> > different sample sizes for which I am testing their >> > > > fit to >> > > > > > > > > > > function >> > G(random#,s) (i.e., inverse distribution). I >> > > would >> > > > like to >> > > > > > > run >> > > > > > > > > > > each group 1 >> > million times. For each s group, by >> > > generating >> > > > random >> > > > > > > > > > > numbers just >> > by 300 x 1million times, I'll have enough >> > > > simulated >> > > > > > > data >> > > > > > > > > > > y(s) to use >> > for the largest and smaller sample sizes. >> > > > > > > > > > > >> > > > > > > > > > > My final >> > column space would look like >> > > > > > > > > > > >> > i ranuni y_s1=G(ranuni,s1) y_s2=G(ranuni,s2) >> > > > > > > y_s3=G(ranuni,s3) >> > > > > > > > > > > 1 >> > > > > > > > > > > . >> > > > > > > > > > > . >> > > > > > > > > > > . >> > > > > > > > > > > m >> > > > > > > > > > > All rows in >> > the above table would be used to caculate >> > > function >> > > > f_s1, >> > > > > > > > > > > f_s2, f_s3 >> > (i.e., AD). This last step is repeated 1 >> > > Million >> > > > times. >> > > > > > > > > > > >> > > > > > > > > > > Can we do this >> > in one to two DATA STEPS? Which syntax >> > > would >> > > > be >> > > > > > > fastest >> > > > > > > > > > > since we have >> > to generate 300 Million random numbers, from >> > > > which we >> > > > > > > would >> > > > > > > > > > > split the >> > sample by 1 Million disjoint sets that we would >> > > then >> > > > > > > compute a >> > > > > > > > > > > statistic 1 >> > Million times using 50, 100, 200, and 300 rows >> > > of >> > > > data >> > > > > > > at >> > > > > > > > > > > each iteration >> > for three different values of s (s1, s2, >> > > s3)? >> > > > > > > > > > > >> > > > > > > > > > > Thank Q! >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > >> > >


Back to: Top of message | Previous page | Main SAS-L page