|
You can do it, but you may want to apply a correction
to the results, such as a bonferroni correction:
Complement factor H polymorphism in age-related macular degeneration.
Klein et al. Science. 2005 Apr 15;308(5720):385-9.
In this, they set a Bonforroni acceptance level of 10 -7.
http://www.sciencemag.org/cgi/content/full/308/5720/385
"Single-marker associations. For each SNP, we tested for allelic association
with disease status. To account for multiple testing, we used the Bonferroni
correction and considered significant only those SNPs for which P <
0.05/103,611 = 4.8 x 10-7. This correction is known to be conservative and
thus "over-corrected" the raw P values (14). Of the autosomal SNPs, only
two, rs380390 and rs10272438, are significantly associated with disease
status (Bonferroni-corrected P = 0.0043 and P = 0.0080, respectively) (Fig.
1A). "
Other methods of determining the "best" variables should also be
run and considered, such as odds ratios if running a logistic regression.
Gathering
all the statistics along one row and then appending rows together will allow
you
to pull the data into Excel, then you can try sorting by one column or
another and
see if the same variables consistently filter to the top of the list. It
helps to also
have the same observations run on each statistic, so missings do create more
of
a problem in comparing variables.
Code-wise, I tend to create one row by putting results into macro variables,
such
as:
data results;
description="Description of this run";
statistic1=&statistic1;
run;
then append that one row to an overall table, creating a row for each run:
proc append base=allresults data=results;
run;
-Mary
----- Original Message -----
From: "Kevin Viel" <citam.sasl@GMAIL.COM>
To: <SAS-L@LISTSERV.UGA.EDU>
Sent: Monday, March 30, 2009 12:04 PM
Subject: Re: Running/Outputting t-test results repeatedly
> On Mon, 30 Mar 2009 11:13:29 -0400, Eduardo Galvan
> <EGalvan@SURVEYSCIENCES.COM> wrote:
>
>>Sorry for the repost, but I'm hoping somebody can help me with this. I
>>have 10 dependent variables (e.g., dv1, dv2,....dv10) that I want run
>>t-tests on and output the mean, standard deviation, and p-value to a
>>data file. The catch is that I have over 2,000 variables that I want to
>>use as the independent variable when I am running these tests. In
>>other words, I need to run 2,000 t-tests on dv1, 2,000 t-tests on dv2,
>>etc.
>
> I am almost sure that you do not want to do this? If so, you will very
> likely need to adjust for multiple testing. If you simulated 2000
> outcomes for just one dependent variable such that the means were equal,
> then you expect to have 0.05 * 2000 = 100 "significant" results at the
> alpha = 0.05 level....
>
> -Kevin
>
|