Date: Thu, 11 Nov 2010 18:40:38 -0800
Reply-To: "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Subject: Re: creating missing randomly
In-Reply-To: <941871A13165C2418EC144ACB212BDB001AD22F6@dshsmxoly1504g.dshs.wa.lcl>
Content-Type: text/plain; charset=utf-8
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Nordlund, Dan (DSHS/RDA)
> Sent: Thursday, November 11, 2010 5:22 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: creating missing randomly
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > yk2k
> > Sent: Thursday, November 11, 2010 4:13 PM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: creating missing randomly
> >
> > Hi, I'm trying to creat new datasets that contains missing vlaues
> using
> > exist data.
> >
> > If I have a data that contains 7 continuous variables like below.
> >
> > A B C D E F G
> > 1 3 4 3 2 3 4
> > 2 4 3 4 5 6 2
> > 2 5 3 4 5 3 1
> > 1 2 3 3 2 1 3
> > 2 5 4 3 4 1 3
> >
> > I want to make missing randomly, but same number of missing within
> each
> > person like below.
> >
> > data-1 (1 missing per person)
> >
> > A B C D E F G
> > 1 . 4 3 2 3 4
> > 2 4 3 . 5 6 2
> > 2 5 . 4 5 3 1
> > . 2 3 3 2 1 3
> > 2 5 4 3 4 . 3
> >
> > data-2 (2 missing per person)
> >
> > A B C D E F G
> > 1 . 4 3 . 3 4
> > 2 4 . . 5 6 2
> > 2 5 . 4 5 . 1
> > . 2 3 3 . 1 3
> > 2 5 . 3 4 . 3
> >
> > ...up to 6 missings per person.
> >
> > Also, is there any way to replace the missing value with mean of rest
> > of
> > values?
> >
> > Thanks.
>
> Here is one way to do it:
>
> data have;
> input A B C D E F G;
> cards;
> 1 3 4 3 2 3 4
> 2 4 3 4 5 6 2
> 2 5 3 4 5 3 1
> 1 2 3 3 2 1 3
> 2 5 4 3 4 1 3
> ;
> run;
>
> **----replace n_miss values with missing----**;
> %let n_miss = 2;
> data want1;
> set have;
> array x[7] A--G;
> do _n_ = 1 to &n_miss;
> ndx = ceil(7*uniform(123));
> do while(x[ndx] EQ .);
> ndx = ceil(7*uniform(123));
> end;
> x[ndx] = .;
> end;
> run;
> proc print;
> run;
>
> **----replace n_miss values with mean of remaining values----**;
> data want2;
> set have;
> array x[7] A--G;
> do _n_ = 1 to &n_miss;
> ndx = ceil(7*uniform(123));
> do while(x[ndx] EQ .);
> ndx = ceil(7*uniform(123));
> end;
> x[ndx] = .;
> end;
> x_mean = mean(of A--G);
> do _n_ = 1 to 7;
> if missing(x[_n_]) then x[_n_] = x_mean;
> end;
> run;
> proc print;
> run;
>
> Hope this is helpful,
>
> Dan
OK, this wasn't as helpful as I had planned. WANT1 was created as I expected, with missing randomly inserted. However, WANT2 is not correct. The wrong values are changed and the mean isn't always inserted (although the initial missings are inserted randomly). If I create WANT3, where instead of replacing by the mean, I replace with a constant (99 for example), then the data step works as I expect. I am obviously brain dead at this point. Can someone point out the error of my ways? Thanks.
data have;
input A B C D E F G;
cards;
1 3 4 3 2 3 4
2 4 3 4 5 6 2
2 5 3 4 5 3 1
1 2 3 3 2 1 3
2 5 4 3 4 1 3
;
run;
**----replace n_miss values with missing----**;
%let n_miss = 2;
data want1;
set have;
array x[7] A--G;
do _n_ = 1 to &n_miss;
ndx = ceil(7*uniform(123));
do while(x[ndx] EQ .);
ndx = ceil(7*uniform(123));
end;
x[ndx] = .;
end;
run;
proc print;
run;
**----replace n_miss values with mean of remaining values----**;
data want2;
set have;
array x[7] A--G;
do _n_ = 1 to &n_miss;
ndx = ceil(7*uniform(123));
do while(x[ndx] EQ .);
ndx = ceil(7*uniform(123));
end;
x[ndx] = .;
end;
x_mean = mean(of A--G);
do _n_ = 1 to 7;
if missing(x[_n_]) then x[_n_] = x_mean;
end;
run;
proc print;
run;
**----replace n_miss values with a constant (99)----**;
data want2;
set have;
array x[7] A--G;
do _n_ = 1 to &n_miss;
ndx = ceil(7*uniform(123));
do while(x[ndx] EQ .);
ndx = ceil(7*uniform(123));
end;
x[ndx] = .;
end;
x_mean = mean(of A--G);
do _n_ = 1 to 7;
if missing(x[_n_]) then x[_n_] = 99;
end;
run;
proc print;
run;
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
|