Date: Wed, 30 May 2007 16:24:09 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: Rounding Issues (v.14)
In-Reply-To: <630080.62193.qm@web37005.mail.mud.yahoo.com>
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 09:22 AM 5/30/2007, Matthew Reeder wrote:
>I've created a weighted composite from 11 variables (we'll call the
>composite comp1, which ranges from 1-10). I then create a binary
>variable (binvar) that assigns each case in the dataset a 0 or a 1,
>depending on whether or not the case reaches a certain minimum on
>comp1 (such as below).
>
> DO IF (comp1>=4.5) .
> COMPUTE binvar=1 .
> ELSE .
> COMPUTE binvar=0 .
> END IF .
> EXECUTE .
>
>Nothing complicated. Binvar will be my filter variable for subsequent
>analyses.
As you've found, so far, so good. By the way, a good replacement for
the above syntax is
RECODE comp1
(4.5 THRU HI = 1)
(OTHER = 0) INTO binvar.
>Let's say that there are 5 people in the dataset with a value of 4.5
>on comp1. Binvar is being assigned a 0 for some of these people, and a
>1 for others. In other words, even though they all are equal to 4.5,
>SPSS views them differently.
You've had the answer: because the values *display as* 4.5 does not
mean they are *equal to* 4.5.
At 11:03 AM 5/30/2007, Melissa Ives wrote:
>It is likely due to the 2nd digit post-decimal.
Actually, the minimum difference cannot be guaranteed to appear in a
modest fixed number of decimal places. SPSS numbers are represented
with 53 bits of precision, which is about 16 decimal digits. But even
that doesn't characterize the representation: the representable numbers
are spaced about as closely as 16-digit decimal numbers, it they aren't
the same set of numbers.
>Further, when I run frequencies on comp1, 4.5 appears twice, with
>different counts next to it. Why is it doing this?
For, of course, the same reason: the numbers* are different, though
their *display forms* are the same, in the format (F<something>.1) that
you are using.
You've had a couple of suggestions (ViAnn Beadle, Jon Peck) for
producing display forms that will show all differences. To write a
little differently, but close to ViAnn's: if, as I assume, your
"weighted composite" is a weighted average, then guarantee the result
is integral by (a) using only integer weights, and (b) taking the *sum*
rather than *average*, using those weights.
But you probably won't like the result. Depending on your weights, you
may have to multiply them by a large number to convert them to integers
while maintaining their relative magnitudes; and while you will see the
exact values of the weighted sums, those values may be integers with a
lot of digits.
Normally, when you've taken a weighted average like that, it's best to
treat it as a continuous quantity, whose the magnitude is important to
the appropriate precision, but whose exact values are not relevant.
It's rarely illuminating to take FREQUENCIES for such a quantity. It
can be useful to use RECODE to classify the values into ranges, and
take FREQUENCIES of the result; you'll have to decide about that.
If you're particularly interested in what's happening near the cutpoint
value of 4.5, I'd try something like this (not tested):
a.) Use the code you already have, to calculate 'comp1'.
b.) Assuming you have an ID variable called CaseNum, and your 11 data
variables are Datum1 to Datum 11, inspect the data by
TEMPORARY /* If desired */.
NUMERIC Delta (E10.3).
VAR LABEL Delta 'Difference from 4.5'.
COMPUTE Delta = comp1 - 4.5.
SELECT IF ABS(Delta) LE 0.2 /* Or other threshold */.
LIST VARIABLES= CaseID comp1 Delta Datum1 TO Datum11.