LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 1999, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 8 Oct 1999 16:09:02 -0700
Reply-To:     David Cassell <cassell@MERCURY.COR.EPA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David Cassell <cassell@MERCURY.COR.EPA.GOV>
Organization: OAO Corp.
Subject:      Re: Replacing Missing values with the Mean
Content-Type: text/plain; charset=us-ascii

JGerstle@SW.UA.EDU wrote: > > I haven't had any luck finding a solution to this in the manuals. > Hopefully someone can help. > > Using SAS 6.12 on Win95, I need to run several PROC CORR > ALPHA on a dataset with several missing values spread throughout > it. According to the SAS Log, it's recommended to use NOMISS > when calculating ALPHA. This is fine, except it leaves us with a > low n. We would like to replace each missing value with the mean > of that variable. Is there an option within PROC CORR or another > PROC to do this automatically or will I have to calculate the means > beforehand and run a data step to replace the missing values. I > have around 40 odd variables so I'd rather do the former than the > latter.

It's straightforward to do. One way is via PROC SQL. Another is to use PROC SUMMARY or PROC MEANS to get the means in a new dataset and combine the info. But I'm not writing any code here, because I want to say this:

This is a bad idea in many cases. Please do not impute data like this unless you can show that this is going to be valid, and will not drive the results [as appears likely given your concern about small n].

Try a few plots for yourself. Take a nice dataset you have already, say n=50, and plot X vs Y. Look at the statistics. Now replace half the X's with the mean of X, and redo. You may see an enormous difference. It will fluctuate depending on the data.

So don't do this if you can possibly avoid it.

David -- David Cassell, OAO cassell@mail.cor.epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page