```Date: Tue, 20 Dec 2005 11:02:27 -0500 Reply-To: Ed Heaton Sender: "SAS(r) Discussion" From: Ed Heaton Subject: Re: Numeric Precision Problem Comments: To: "Terjeson, Mark (IM&R)" Content-Type: text/plain; charset="us-ascii" Mark, Your number of precision for digits applies only to integers. SAS cannot accurately store 0.1, which is only one digit. Ed Edward Heaton, SAS Senior Systems Analyst, Westat (An Employee-Owned Research Corporation), 1600 Research Boulevard, RW-4541, Rockville, MD 20850-3195 Voice: (301) 610-4818 Fax: (301) 294-3879 mailto:EdHeaton@Westat.com http://www.Westat.com -----Original Message----- From: Terjeson, Mark (IM&R) [mailto:Mterjeson@russell.com] Sent: Tuesday, December 20, 2005 10:51 AM To: Ed Heaton; SAS-L@LISTSERV.UGA.EDU Subject: RE: Re: Numeric Precision Problem Hi, Edward will win his bet that this exists on all operating systems. Indeed all operating systems have what we call "precision". The limit of the number of digits that can be guarenteed to be accurate while any digits beyond the precision will be random garbage. This exists on ALL platforms, PC, mainframes, ..., ALL. In the 1970s and 80s precision was 11 or 13 digits. Current day CPUs and FPUs have 15 on average, but some still only have 11 and 13. What does this "precision" mean for the average person, well: Many systems only have 11 to 13 digits of precision and SAS has 15 digits of precision. So if you ignore the decimal point you only get 15 valid digits. Anything after 15 digits is either random garbage or usually truncated. For STORAGE (or actual value) of a number -- remember to ignore the decimal point, this is the key. DISPLAYING that number is a different thing in displaying a bunch of characters. Your typical floating point number is going to store 15 good digits plus knowledge of the position where the decimal point is supposed to go. When you have just a single digit of 9, but you move it 15 or more places to the right, such as: data _null_; x = 9.0E-15; format x 30.20; put x=; run; data _null_; x = 9.0E-18; * more than 15 ; format x 30.20; put x=; run; The single digit of 9 shows up good (more than 15 digits) because the format routine is merely concatenating the leading zero string character "0" together plus a period symbol in the string somewhere. So what you see is not a numeric value but a character string of a single digit of 9 plus the routine adding zeros and period. The counting of 15 digits of precision starts at the 9 and you have room for 14 more good digits, such as: data _null_; x = 91234567890.0E-25; format x 32.30; put x=; run; Once you "add" that single digit of 9.0E-15 to any integer (a digit to the left of the decimal point) such as: data _null_; x = 9.0E-18; x = x + 1; format x 30.20; put x=; run; ... you *now* have to start counting at the integer, which means counting those 15 good digits is the "1" plus the next 14 digits of "0" (remember ignoring the decimal point) and you find that your 9 digit is in the 16th position and is going to go poof (get truncated) The moral of the story is that "nothing is wrong". (optical illusion by the format routine! har har) People working on multiple platforms see this early in their experiences since internal floating point number handling utilizes the operating systems built in logarithm tables for ease of many of the math functions especially multiplication and division. Trying to deal with really big numbers becomes more inaccurate very quickly. It is more accurate in handling large numbers to convert to the log function value, perform the math with a value much reduced in size, and then convert back with the anti-log. This works relatively well. On the same platform you will see the first 15 digits remain the same and mostly see the >15 digits stay pretty close (on the same platform). Those running >15 digit tests, or troubleshooting precision issues across different operating systems and platforms will notice that the same logic and code will have slightly different digits out in the >15 digit positions due to slight differences between logarithm tables on the different platforms. Bear in mind that this discussion is way out in the >15 range of digits, but the most noticable is between Windows and AIX platforms, and secondary Unix or MVS and AIX. You still can rely on your 15 digits of precision but the >15 digit positions will differ. This becomes obvious when comparing floating point values on the same platform will generally come out the same each time using the same log tables so that the comparison testing matches. But if you create data on one platform and other data on another platform and then run comparison test such as an IF statement or WHERE clause, etc. there will be no match, or your PROC COMPARE will report differences. Typically very miniscule in magnitude. You will find that storing of numbers gets effectively truncated after the 15th digit so numbers with differences after the 15th digits will more than likely get the internal storage exactly the same for both. These are not bugs. This is the normal everyday realm we all work with and have for the past 40+ years. The use of formats, such as 32.30, you will notice provide you up to approx 32 as the maximum length that the language author is comfortable to let you display with WHILE ON ONE PLATFORM, with the caveat that the precision is 15 accurate digits and the >15 digits are just typically going to many time come out similar under similar conditions. BUT, *all* programmers MUST understand that the number of digits of precision (regardless of where the decimal point is, and less leading zeros) is something that is a known commodity that we have to factor into our designs. In other words, if more than 15 digits must be accurate, then we must specifically code the routines to handle the double and triple+ sets of 15 to obtain more. If you are on a machine that has 13 or 11 digits of precision these same truncation and random garbage effects just take place after the 13 or 11 accurate digits respectively. Applications that work across platforms are not uncommon to require careful use of format selection, INT and FIX functions, as well as unglamourous converting to string and back again to insure multi-log-environment accuracy. And yes in many of these cases the term "accuracy" merely means "like" or "matching" results. Since the results are really made up of 11,13, or 15 digits of accuracy followed by "consistent garbage" that just happens to be the same each time. Again, the moral of the story is that "nothing is wrong". Hope this is helpful. Mark Terjeson Senior Programmer Analyst, IM&R Russell Investment Group Russell Global Leaders in Multi-Manager Investing -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Ed Heaton Sent: Tuesday, December 20, 2005 5:08 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Numeric Precision Problem Arthur; I got the same results as you on a Windows XP Pro system running SAS 9.1.3 SP3. I don't find the results surprising. When I took programming classes (FORTRAN and Pascal) in undergraduate studies, I was told to NEVER compare for equality with floating-point numbers. That is - in SAS terms - never code something like If ( x eq 0.1 ) then ... Instead, I was told to code something more like ... If ( abs( x - 0.1 ) lt 1e-8 ) then ... where the comparison value is sufficiently small. Your error comes from the calculations done in the SUMMARY procedure. When you do calculations on fractional floating-point numbers you compound the error. Now, SAS often does some fuzzing, so I'm not as rigorous as I used to be about that. However, SAS uses the IEEE double-byte standard for floating-point numbers and all of the rounding differences apply. When I do try to hedge my bets, I use the ROUND() function satisfactorily rather than the abs() function as above. That was just to tell you how I was taught. Now, I don't know what your system looks like, but I am willing to bet that this type of problem exists on all operating systems. You just can't accurately represent many decimal numbers as binary. Ed Edward Heaton, SAS Senior Systems Analyst, Westat (An Employee-Owned Research Corporation), 1600 Research Boulevard, RW-4541, Rockville, MD 20850-3195 Voice: (301) 610-4818 Fax: (301) 294-3879 mailto:EdHeaton@Westat.com http://www.Westat.com -----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Arthur Tabachneck Sent: Monday, December 19, 2005 7:15 PM To: sas-l@uga.edu Subject: Numeric Precision Problem Before telling you about my problem, I should mention why I am posting this. First, since I've recently upgraded to a higher performance system, I request that anyone interested attempt to replicate the problem, on their system, and write me off-line indicating the kind of system you ran the code on and whether you encountered the same problem. I'll summarize the responses and let the list know the result. Second, if it turns out that this problem is not to be limited to my system, to alert the rest of us to control for numeric precision when we normally might not. Numeric precision problems have appeared on the list a number of times, but they have always seemed to occur (I thought) in rather unique situations. Now I have encountered one in code which, I'm sure, is very similar to code that other users run every day. The entire process, including sample data, is shown below. Basically, it is looking at fairly simple data (which can only have values of .8, .9, 1.0, 1.1 or 1.2), aggregated using a proc summary to obtain weighted means, and then colour coding the results to represent three colours (red, yellow and green) by applying a rather straight forward set of if-then-else statements. The problem is that every code gets assigned a colour value of 'yellow', indicating that none of the values were equal to or lower than .8, or equal to or greater than 1.2. If one adds a format of 32.30 to the exposures' variable, you'll see the proc summary resulted in a value of something like 1.19999999999999999etc rather than 1.2 for the highest values. Worse, the lowest values will appear as 0.80000000000000etc. Of course it can be fixed with rounding (and should only take my staff, consultants and I around 6 months to find all of the occurences in our code). The question is not how to fix the problem. As Wendi Wright pointed out earlier this year: 'This is called numeric representation error. SAS has an entire article about it: http://support.sas.com/techsup/technote/ts230.html. The article talks all about this error and other errors and also offers several suggestions to 'fix' it, one of which is rounding.' It is, rather, to raise a caution flag to any of us who thought we were special and didn't have to worry about such errors. Art -------- data table; input rategroup exposures; if rategroup eq 1 then differential=0.8; else if rategroup eq 2 then differential=0.9; else if rategroup eq 3 then differential=1.0; else if rategroup eq 4 then differential=1.1; else if rategroup eq 5 then differential=1.2; cards; 1 1.625 1 0.29167 1 1.01 1 0.88 1 0.25 1 0.625 1 0.79167 2 20.1 2 40.0 3 30.0 3 45.1 4 40.1 4 50.0 5 1529.14 5 2128.35 5 627.10 5 2708.03 5 3233.06 5 836.41 5 1078.27 5 1248.07 ; run; proc summary data=table; var differential; by rategroup; weight exposures; output out=newtable (DROP=_:) MEAN=; run; data newtable; format colour \$6.; set newtable; if differential le 0.80 THEN COLOUR='green'; else if differential ge 1.20 THEN COLOUR='red'; else COLOUR='yellow'; run; the resulting newtable I got: colour rategroup differential ------ --------- ------------ yellow 1 .8 yellow 2 .9 yellow 3 1.0 yellow 4 1.1 yellow 5 1.2 ```

Back to: Top of message | Previous page | Main SAS-L page