Date: Fri, 14 Jul 2000 16:31:05 GMT Dale McLerran "SAS(r) Discussion" Dale McLerran Fred Hutchinson Cancer Research Center Re: Methods of Data Extraction

I believe that we can make a slight refinement to David Cassell's approach. My understanding of David Wright's need is to have something which will provide test true if 1000*Q is integer valued. Since integer values can be represented exactly in base 2, a function which returns integer values for 1000*Q is what we require. This can be accomplished with the function

mod(fuzz(1000*Q),1)=0

What are we doing here? The multiplication by 1000 is clear. The FUZZ function will return an integer value if the argument 1000*Q is within 1E-12 of an integer value. The MOD(Z,1) function returns 0 if the argument Z is an integer.

Testing whether R has a value in the set (0.5, 0.56, 0.59) is equivalent to testing whether fuzz(100*R) is in the set (50, 56, 59).

By the way, while the David Cassell's basic idea is correct, the implementation was incorrect. His test condition was

.0001 > (1000*Q - int(1000*Q))

My understanding of his understanding (got that?) is that he was attempting to provide a test which would be true if the quantity Q was within .0001 limits of a 4 decimal base 10 representation. The above test has a couple of problems. First, instead of taking the difference Z=(1000*Q - int(1000*Q)), we should take ABS(Z). Second, Z should be rescaled back to the original units by dividing by 1000. Third, the direction of the inequality is wrong. We want to test whether the absolute value of the difference is less than .0001, not greater than .0001.

Dale

-------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center Seattle, WA 98109 mailto:dmclerra@fhcrc.org ph: (206) 667-2926 fax: (206) 667-5977 --------------------------------------

Cassell.David@EPAMAIL.EPA.GOV (David L. Cassell) wrote in <8525691C.00024177.00@EPAHUB2.RTP.EPA.GOV>:

>David, you wrote: >> I have two datasets as followed. I want to extract the data from >> Dataset 1 by each number's digit is less than 4 (e. g. Q =0.01 or 0.03 >> or 0.035) to form dataset3=(0.5 0.01, 0.56 0.03, 0.59 0.035). Based on >> Dataset3, I hope to extract the data from Dataset2 according the value >> of R=(0.5, 0.56, 0.59) to form a new dataset4=(0.5 135.211890, >> 0.56 128.650492, 0.59 126.456255). Any suggestions are welcome. > >I'm sorry, but even given your example [below] I think you need to >clarify your example a great deal. Do you mean that you only want the >values of Q which have less than four *decimal*places* ? If so, then >you are in a lot of trouble. Because computers store data in base 2 >instead of base 10, there is no way to make sure that you get .035 there >instead of .0349999999999999999 >which would violate your proposal. Do you mean you only want numbers >which are within 0.0001 of a number representable using four decimal >places? > >I'm going to take a wild guess and assume that is what you want. Here's >one way to do it. > >data three; > merge one(in=inone) two(in=intwo); > by r; > if inone and intwo and .0001 > (1000*Q - int(1000*Q)); > run; > >++++++++++++++++++++++++++++++ >Dataset1 > R Num Q > 0.5 6 0.01 > 0.51 11 0.018333333333333333 > 0.52 14 0.023333333333333334 > 0.53 14 0.023333333333333334 > 0.54 14 0.023333333333333334 > 0.55 17 0.028333333333333332 > 0.56 18 0.03 > 0.5700000000000001 20 0.03333333333333333 > 0.58 20 0.03333333333333333 > 0.59 21 0.035 > >Dataset2 > R X > 0.5 135.211890 > 0.56 128.650492 > 0.59 126.456255 > 0.64 125.772462 > 0.69 122.994524 > 0.73 120.428786 > >HTH, >David >-- >David Cassell, OAO Corp. Cassell.David@epa.gov >Senior computing specialist >mathematical statistician >

Back to: Top of message | Previous page | Main SAS-L page