LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 11 Nov 2010 18:40:38 -0800
Reply-To:     "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Subject:      Re: creating missing randomly
In-Reply-To:  <941871A13165C2418EC144ACB212BDB001AD22F6@dshsmxoly1504g.dshs.wa.lcl>
Content-Type: text/plain; charset=utf-8

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Nordlund, Dan (DSHS/RDA) > Sent: Thursday, November 11, 2010 5:22 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: creating missing randomly > > > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > > yk2k > > Sent: Thursday, November 11, 2010 4:13 PM > > To: SAS-L@LISTSERV.UGA.EDU > > Subject: creating missing randomly > > > > Hi, I'm trying to creat new datasets that contains missing vlaues > using > > exist data. > > > > If I have a data that contains 7 continuous variables like below. > > > > A B C D E F G > > 1 3 4 3 2 3 4 > > 2 4 3 4 5 6 2 > > 2 5 3 4 5 3 1 > > 1 2 3 3 2 1 3 > > 2 5 4 3 4 1 3 > > > > I want to make missing randomly, but same number of missing within > each > > person like below. > > > > data-1 (1 missing per person) > > > > A B C D E F G > > 1 . 4 3 2 3 4 > > 2 4 3 . 5 6 2 > > 2 5 . 4 5 3 1 > > . 2 3 3 2 1 3 > > 2 5 4 3 4 . 3 > > > > data-2 (2 missing per person) > > > > A B C D E F G > > 1 . 4 3 . 3 4 > > 2 4 . . 5 6 2 > > 2 5 . 4 5 . 1 > > . 2 3 3 . 1 3 > > 2 5 . 3 4 . 3 > > > > ...up to 6 missings per person. > > > > Also, is there any way to replace the missing value with mean of rest > > of > > values? > > > > Thanks. > > Here is one way to do it: > > data have; > input A B C D E F G; > cards; > 1 3 4 3 2 3 4 > 2 4 3 4 5 6 2 > 2 5 3 4 5 3 1 > 1 2 3 3 2 1 3 > 2 5 4 3 4 1 3 > ; > run; > > **----replace n_miss values with missing----**; > %let n_miss = 2; > data want1; > set have; > array x[7] A--G; > do _n_ = 1 to &n_miss; > ndx = ceil(7*uniform(123)); > do while(x[ndx] EQ .); > ndx = ceil(7*uniform(123)); > end; > x[ndx] = .; > end; > run; > proc print; > run; > > **----replace n_miss values with mean of remaining values----**; > data want2; > set have; > array x[7] A--G; > do _n_ = 1 to &n_miss; > ndx = ceil(7*uniform(123)); > do while(x[ndx] EQ .); > ndx = ceil(7*uniform(123)); > end; > x[ndx] = .; > end; > x_mean = mean(of A--G); > do _n_ = 1 to 7; > if missing(x[_n_]) then x[_n_] = x_mean; > end; > run; > proc print; > run; > > Hope this is helpful, > > Dan

OK, this wasn't as helpful as I had planned. WANT1 was created as I expected, with missing randomly inserted. However, WANT2 is not correct. The wrong values are changed and the mean isn't always inserted (although the initial missings are inserted randomly). If I create WANT3, where instead of replacing by the mean, I replace with a constant (99 for example), then the data step works as I expect. I am obviously brain dead at this point. Can someone point out the error of my ways? Thanks.

data have; input A B C D E F G; cards; 1 3 4 3 2 3 4 2 4 3 4 5 6 2 2 5 3 4 5 3 1 1 2 3 3 2 1 3 2 5 4 3 4 1 3 ; run;

**----replace n_miss values with missing----**; %let n_miss = 2; data want1; set have; array x[7] A--G; do _n_ = 1 to &n_miss; ndx = ceil(7*uniform(123)); do while(x[ndx] EQ .); ndx = ceil(7*uniform(123)); end; x[ndx] = .; end; run; proc print; run;

**----replace n_miss values with mean of remaining values----**; data want2; set have; array x[7] A--G; do _n_ = 1 to &n_miss; ndx = ceil(7*uniform(123)); do while(x[ndx] EQ .); ndx = ceil(7*uniform(123)); end; x[ndx] = .; end; x_mean = mean(of A--G); do _n_ = 1 to 7; if missing(x[_n_]) then x[_n_] = x_mean; end; run; proc print; run;

**----replace n_miss values with a constant (99)----**; data want2; set have; array x[7] A--G; do _n_ = 1 to &n_miss; ndx = ceil(7*uniform(123)); do while(x[ndx] EQ .); ndx = ceil(7*uniform(123)); end; x[ndx] = .; end; x_mean = mean(of A--G); do _n_ = 1 to 7; if missing(x[_n_]) then x[_n_] = 99; end; run; proc print; run;

Dan

Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204


Back to: Top of message | Previous page | Main SAS-L page