LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2009, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 4 Feb 2009 15:39:01 -0500
Reply-To:     Paul St Louis <pstloui@DOT.STATE.TX.US>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul St Louis <pstloui@DOT.STATE.TX.US>
Subject:      Re: OT: Chance to Make SAS-L History: Did You Know That...
Comments: To: "Michael A. Raithel" <michaelraithel@WESTAT.COM>

A missing value can/will affect the accuracy of your computations? Yesterday I posted 'Avoiding Division by zero Err Msg Generates "I"'. Mary <mlhoward@avalon.net> responded with a link to a very good article by Robin High....

http://www.uoregon.edu/~robinh/missing_data.txt

A must read for anyone who thinks they fully grasp the implications of missing data. Although I already understood that the best way to handle missing data is with the missing function (whether numerical, character, or date), I thought I would list a few excerpts from Robin's paper. One of Robin's suggestions is to use...

IF (MISSING(var) EQ 1) or IF (MISSING(var)

Otherwise, some computations with missing data will produce inaccurate results.

DATA _null_; x1=.; x2=3; x3=6;

x_sum1 = x1 + x2 + x3; x_sum2 = SUM(x1, x2, x3); PUT x_sum1 x_sum2; RUN;

Log: . 9 NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 363:13

x_sum1 computes incorrectly, but x_sum2 is correct.

If and Where statements also affected. When IF or WHERE statements are entered, SAS treats missing values as if they were negative numbers with extremely large magnitudes.

A missing data value in SAS is actually a special, reserved floating point number. The official 28 missing data codes are defined as: * An period followed by an underscore: ._ * A single period: . * A period followed by an alphabetic letter: .a .b .c ... .x .y .z

Comparing numerical data value with an open-ended IF statement is risky, for example: IF ( x_var LT <any real number>)

This type of IF statement will be "true" whenever x_var contains a missing value. Missing value comparisons are also relevant with the greater than (GT) test, e.g., IF (y_var GT x_var) will be: * true if y_var is not missing but x_var is missing * false if y_var is missing

Even though they are not treated as numerical data in calculations, missing data codes behave as if they had unique, ordered numerical values. Since .z is defined to be the 'largest' missing data value, a more comprehensive IF statement that will work for all missing data values is: IF (x_var LE .z)

A very good paper to read with many more examples....


Back to: Top of message | Previous page | Main SAS-L page