Date: Wed, 3 Feb 2010 11:25:46 -0500
Reply-To: Jim Groeneveld <jim.1stat@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jim Groeneveld <jim.1stat@YAHOO.COM>
Subject: Re: How to output number of missing character variables to a file
Hi Lance,
In the summer of 2004 I already was preparing a macro (MULTFREQ) that would
present the frequencies of many variables in one table. (Because of always
more utgent work that macro hasn't been further developed since.) Yet I want
to include it here as it shows the code to combine frequencies of multiple
variables, with roughly the same range of values (including missings) and
are of the same type. You can simplify it to show just the amounts and
percentages of missing and non-missing values per variable.
A hint: before going on with this example code recode all missings of each
variable, character variable in your case, to a common value (e.g. 'Missing'
or 0 if numeric) and the rest, the non-missing values to another common
value (e.g. 'Present' or 1 if numeric). PROC FREQ gives both the amounts and
the percentages.
Instead of applying PROC FREQ again on the generated dataset &Combined in
the macro below, you may process that dataset differently as desired, e.g.
using PROC REPORT.
Regards - Jim.
--
Jim Groeneveld, Netherlands
Statistician, SAS consultant
http://jim.groeneveld.eu.tf
~~~~~~~~~ START PROGRAM CODE ~~~~~~~~~
OPTIONS ls=80 ps=63 formchar = '|____|||___';
DATA TestData;
INPUT p1 p2 p3 p4 p5;
CARDS;
1 . 3 0 3
. 5 1 1 2
2 3 4 2 1
5 . . 3 0
3 4 . 4 5
5 4 . 5 4
. 2 3 0 3
3 . 5 1 2
;
RUN;
%MACRO MultFreq (DATA=, VarList=, Flipped=N);
%LOCAL I Variable Combined N VarName Value;
%LET VarName = ___VName;
%LET Value = ___Value;
%LET Combined = ___Combi;
* Save frequencies of each variable in datasets;
PROC FREQ DATA=&Data;
%LET I = 1;
%LET Variable = %SCAN (&VarList, &I, %STR( ));
%DO %WHILE (&Variable NE);
%LOCAL Dataset&I; %* as amny as are necessary;
%LET Dataset&I = __Freqs&I;
TABLES &Variable / NOPRINT OUT=&&Dataset&I (DROP=Percent);
%LET I = %EVAL(&I+1);;
%LET Variable = %SCAN (&VarList, &I, %STR( ));
%END;
RUN;
* Reshape each frequency listing;
%LET N = %EVAL(&I-1); %* total N of variables from previous loop;
%DO I = 1 %TO &N;
DATA &&Dataset&I (KEEP=&VarName &Value Count);
SET &&Dataset&I;
&VarName = "%SCAN (&VarList, &I, %STR( ))";
&Value = %SCAN (&VarList, &I, %STR( ));
LABEL &VarName='Variable name'
&Value='Variable value';
RUN;
%END;
* Combine all variable frequencies;
DATA &Combined;
SET
%DO I = 1 %TO &N;
&&Dataset&I
%END;
;
RUN;
/* PROC PRINT DATA=&Combined; RUN; */
PROC FREQ DATA=&Combined; * (or Tabulate or Report);
TABLES %IF (&Flipped EQ N) %THEN &VarName * &Value;
%ELSE &Value * &Varname;
/ MISSING;
WEIGHT Count;
RUN;
%MEND MultFreq;
%MultFreq (DATA=Testdata, VarList=P1 P2 P3 P4 P5);
%MultFreq (DATA=Testdata, VarList=P1 P2 P3 P4 P5, Flipped=Y);
~~~~~~~~~ END PROGRAM CODE ~~~~~~~~~
On Tue, 2 Feb 2010 21:08:46 -0800, Lance Smith <medicaltrial@GMAIL.COM> wrote:
>Hi
>
>I have a dataset with 50 character variables (SNP1 - SNP50), each of
>which have a certain amount of missing data. I want to create a table
>that will give me the percentage of missing data per variable. Maybe
>something like this:
>
>VARIABLE N %MISSING
>SNP1 2010 2.6%
>.
>.
>.
>.
>.
>SNP50 2010 5%
>
>Thank you for your help.