| Date: | Tue, 20 Mar 2007 13:06:11 -0500 |
| Reply-To: | "Oliver, Richard" <roliver@spss.com> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | "Oliver, Richard" <roliver@spss.com> |
| Subject: | Re: [BULK] Missing data, with DATA LIST FREE or LIST |
|
| In-Reply-To: | A<7.0.1.0.2.20070320130247.038f8ec8@mindspring.com> |
| Content-Type: | text/plain; charset="us-ascii" |
My apologies in advance if this is not relevant. I'm not sure how this thread started; so I'm not sure how the periods got into the data source in the first place, but Data List is now far more flexible with reading delimited files that contain missing data than in the older versions (prior to SPSS 10, I think), provided there is a consistent delimiter between values:
A simple example:
data list list (",") /var1 var2 var3.
begin data
,12,13
21,,23
31,32,,
,,43
end data.
In this context, spaces as delimiters can be a bit problematic since multiple spaces will be interpreted as multiple missing values.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Richard Ristow
Sent: Tuesday, March 20, 2007 12:15 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: [BULK] Missing data, with DATA LIST FREE or LIST
Importance: Low
Here's a little data-reading code, from another posting:
DATA LIST LIST
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).
BEGIN DATA
1220 08/17/05 1 2 .
1220 03/09/06 3 2 2
1390 11/09/05 1 1.67 .
1390 02/08/06 1 1.67 0
END DATA.
Notice the system-missing values for ESDiff.
They're read as desired, but by a backwards route: SPSS doesn't
recognize "." as a code for "missing", but as an invalid numeric field.
Since the field is invalid, SPSS makes the result system-missing.
And there are lengthy warnings for every one, until MXWARNS is reached:
DATA LIST LIST SKIP=2
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).
BEGIN DATA
caseID date_emplymnt ES ESmean ESDiff
1220 08/17/05 1 2 .
>Warning # 1111
>A numeric field contained no digits. The result has been set to the
>system-missing value.
>Command line: 414 Current case: 1 Current splitfile group: 1
>Field contents: '.'
>Record number: 3 Starting column: 48 Record length: 48
Does anybody have advice how to read the missing values, without all
the warning messages?
One solution, often suggested, is to replace the '.' fields by '-1', or
some other value that can't occur in real data. When the data has been
read, either declare that value user-missing, or recode it to
system-missing.
I don't like that, very much. It means an extra data-preparation step
preceding SPSS, to change '.' to '-1' globally.
(Or rather, ' . ' or ' .<CR>' to '-1', so you won't change legitimate
decimal points.)
And I think it makes the file less readable. A '.' looks missing. A
'-1' stands out less, visually; and unless you know the project well,
it's hard to be sure that it isn't a data value.
|