| Date: | Thu, 8 Sep 2005 13:49:40 -0700 |
| Reply-To: | David L Cassell <davidlcassell@MSN.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | David L Cassell <davidlcassell@MSN.COM> |
| Subject: | Re: Test of Independence |
| In-Reply-To: | <200509081427.j88AjVuL013419@malibu.cc.uga.edu> |
| Content-Type: | text/plain; format=flowed |
|---|
montey_man11@HOTMAIL.COM wrote:
>I am dealing with a population with two variables. These two variables
>should represent two random digit sets (uniform distribution). What I
>need to do is to ensure that these two random digit sets are random and
>independent. I am looking for randomness and that they are independence
>in order to conduct two simultaneous tests on this same population
>without one test biasing the other. Is this possible in SAS?
Sure. But some of it will have to be coded by you.
There's an immense literature on PRNGs and testing of their results.
There's more now that people want really, really good PRNG so they can
feel better about their encryption algorithms. You can find a ton of
stuff on this just by going to Google and looking for
'testing pseudo-random number generators'
Also look specifically for work by George Marsaglia on this subject.
>Since this is an inherited dataset, I am not sure how the two digits
>were generated.
Ooh. you have my deepest sympathies.
> I want to 1) make sure that they follow a uniform
>distribution, 2) that it is random, and 3) that the two variables are
>Independent.
>
>for #1, all i needed to do is to run proc freq on the variables and
>validate that it is distributed following a uniform distribution.
The standard chi-squared is typically done on this.
>for #2 I am stuck! I am not sure how I can determine the distribution
>of values is random (rather than, say, ordered by some value)
I would recommend using Marsaglia's overlapping M-tuple test. I don't
have a SAS implementation of it. Marsaglia's original paper on it was in
1985, I think.
Other people might recommend Leeb's test and some version of a runs
test. These are also good. Entacher has some papers on the runs-up
statistic for evaluating randomness of a PRNG stream.
>for #3 I am also stuck. I am not sure how to test the independence of
>the two groups. What I am thinking of doing is to get the value of the
>correlation coefficient. However, from my LIMITED knowledge of
>Statistics, I know that a correlation of zero does not mean
>independence. Also, the correlation only tests linear relationships.
>
>Any help is deeply appreciated.
At this point, you might want to think about time series analysis. This
approach won't be as robust or as reliable as, say, Marsaglia's overlapping
M-tuple test, but it can be implemented in SAS. Choose NLAG to be larger
than half your data set size, because the default is min(24, 1/4th data set
size),
which is fine for ARIMA mdeling but not for your purposes.
proc arima data=YourData;
identify var=X crosscorr=Y nlag=??? scan esacf p=(0:50) q=(0:50);
run;
Unfortunately, this will also give you a mammoth amount of output to sift
through. And it won't find long-term effects, where long-term is greater
than 50 lags. So you can also try a really simple technique. Plot Y vs. X
in PROC GPLOT. Then look for patterns. Just remember, this isn't a
Rorschach blot test.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
|