Date: Thu, 20 Oct 2005 13:05:52 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Excel dates, again (was, re: Date question...)
In-Reply-To: <5CFEFDB5226CB54CBB4328B9563A12EE02EAB404@hqemail2.spss.com >
Content-Type: text/plain; charset="us-ascii"; format=flowed
Follow-up to posting
Date: Wed, 19 Oct 2005 04:20:37 +0200
From: Luca Meyer <lucameyer@tiscali.it>
Subject: Date question: I know this is quite simple...
Luca Meyer wrote, "I have dates as number, the difference between date
of
interested and January 1st, 1900." I posted a (correct) answer to that
inquiry. (However, Luca, check that you question is right; see below.)
I added the *obiter dictum* "You're converting Excel dates, right?
January 1st, 1900 is day 1, not day 0". That led Jon Peck of SPSS,
Inc., to write off-list with some important points about Excel dates
INCLUDING TWO "GOTCHAs" These and other points about Excel dates may be
found on http://www.cpearson.com/excel/datetime.htm, which Jon pointed
me to.
(Excel dates are a useful topic, if we -- I -- get going on that SPSS
FAQs list we were discussing in September. In fact, date-time
representations more broadly: SPSS, SAS, Excel, Access. Any others?)
At 09:05 AM 10/19/2005, Peck, Jon wrote:
>A. If you do in fact have these date values in Excel, if you apply a
>date format to the column in Excel, SPSS will automatically interpret
>these correctly when you read in the spreadsheet.
If you have an SPSS file with integer dates, you're stuck with using
SPSS conversion code. However, more commonly you're reading an Excel
spreadsheet directly to SPSS, and THIS IS THE WAY TO GO. As I'll show
below, it sidesteps both forms of the "gotcha."(1)
But if you do have integer dates from Excel,
B. FIRST "GOTCHA":
After the first two months of the representable date range, Excel dates
are represented as if January 1, 1900 is day TWO, and the correct
conversion is
COMPUTE DATE_CVT = DATE.MDY(01,01,1900)
+ TIME.DAYS(XCL_DATE)
- TIME.DAYS(2).
or
COMPUTE DATE_CVT = DATE.MDY(01,01,1900)
+ TIME.DAYS(XCL_DATE-2).
(The Excel 97 Help files, in "Tips on entering dates and times", state,
"days are numbered from the beginning of the century; the date serial
number 1 corresponds to the date January 1, 1900"; that is, they are
WRONG.)
C. SECOND "GOTCHA":
Excel can also represent dates as a day count where 01-JAN-1904 is day
0. (Menu Tools/Options, tab Calculation, section Workbook Options,
select "1904 date system".) Fortunately,
1. This is rarely used
2. SPSS still reads date-formatted columns correctly
(Two representations for the same quantity, especially two SIMILAR
representations, is almost too horrible to contemplate.)
Extended discussion, and test runs
----------------------------------
From Jon Peck and the cited Web page,
>The integer portion of the [Excel date], ddddd, represents the number
>of days since 1900-Jan-0. The number 1 represents 1900-Jan-1.
>[However, the Excel date] number is one greater than the actual number
>of days [from with 1900-Jan-1]. This is because Excel behaves as if
>the date 1900-Feb-29 existed. It did not.
>This is by design. It was truly a bug in Lotus 123. When Excel was
>introduced, 123 had nearly the entire market for spreadsheet
>software. Microsoft decided to continue Lotus' bug, in order to be
>fully compatible.
(In days gone by, system programmers would report problems with OS/360,
etc., to IBM and sometimes get the answer, "Working as designed." For
crazy ones, they coined the phrase "broken as designed." It's
applicable here.)
Test run; SPSS draft output. The first three columns are read from
Excel; the fourth is calculated. The first column is Excel
date-formatted; the second is the same value (via a formula), but
number-formatted. (The second 28-Feb-1900 is Excel's "February 29,
1900.") The conversion formula is the one I wrote for Luca Meyer;
variable "Meyer" is Luca Meyer's integer values and desired dates
(dd/mm/yyyy). Converted dates are correct through February 1900, but
one day too late thereafter.
GET TRANSLATE FILE=XCL_DATE
/TYPE=XLS /FIELDNAMES /RANGE=a11:c30.
COMPUTE DATE_CVT = DATE.MDY(01,01,1900)
+ TIME.DAYS(DATE#)
- TIME.DAYS(1).
FORMATS DATE_CVT (DATE11).
VAR WIDTH DATE_CVT (11).
LIST.
|---------------------------|-----------------------|
|Output Created |20 Oct 05 11:44:21 |
|---------------------------|-----------------------|
DATE DATE# MEYER DATE_CVT
. -1.000000 ----|---10----|---20 30-DEC-1899
. .000000 31-DEC-1899
01-JAN-1900 1.000000 01-JAN-1900
01-JAN-1900 1.000000 01-JAN-1900
28-FEB-1900 59.000000 28-FEB-1900
28-FEB-1900 60.000000 01-MAR-1900
01-MAR-1900 61.000000 02-MAR-1900
30-MAR-1900 90.000000 31-MAR-1900
01-JAN-1904 1,462.000000 02-JAN-1904
02-JAN-1904 1,463.000000 03-JAN-1904
03-JAN-1904 1,464.000000 04-JAN-1904
01-JAN-2000 36,526.000000 02-JAN-2000
11-AUG-2001 37,114.000000 12-AUG-2001
20-OCT-2001 37,184.000000 21-OCT-2001
20-OCT-2001 37,184.000000 37183 20/10/2001 21-OCT-2001
08-NOV-2001 37,203.000000 37202 08/11/2001 09-NOV-2001
16-MAY-2002 37,392.000000 17-MAY-2002
16-MAY-2002 37,392.000000 37391 16/05/2002 17-MAY-2002
19-OCT-2005 38,644.000000 20-OCT-2005
Number of cases read: 19 Number of cases listed: 19
>D. Excel actually has two different systems of date values. The
>default on Windows is as Richard described [with the correction], but
>the Macintosh default, which is also available on Windows if you
>choose it, starts in 1904, so the meaning of the day count would be
>different.
This is the same run, but with "1904 date system" in the spreadsheet.
The first, date-formatted, column is read as presented in Excel; the
converted dates are exactly four years too low (after 29 Feb 1904).
* -------------------------------------------------
* Now, run with spreadsheet set to 1904 date system
* -------------------------------------------------
GET TRANSLATE FILE=XCL_DATE
/TYPE=XLS /FIELDNAMES /RANGE=a11:c30.
COMPUTE DATE_CVT = DATE.MDY(01,01,1900)
+ TIME.DAYS(DATE#)
- TIME.DAYS(1).
FORMATS DATE_CVT (DATE11).
VAR WIDTH DATE_CVT (11).
LIST.
|---------------------------|-----------------------|
|Output Created |20 Oct 05 11:46:42 |
|---------------------------|-----------------------|
DATE DATE# MEYER DATE_CVT
. -1.000000 ----|---10----|---20 30-DEC-1899
01-JAN-1904 .000000 31-DEC-1899
02-JAN-1904 1.000000 01-JAN-1900
02-JAN-1904 1.000000 01-JAN-1900
29-FEB-1904 59.000000 28-FEB-1900
01-MAR-1904 60.000000 01-MAR-1900
02-MAR-1904 61.000000 02-MAR-1900
31-MAR-1904 90.000000 31-MAR-1900
02-JAN-1908 1,462.000000 02-JAN-1904
03-JAN-1908 1,463.000000 03-JAN-1904
04-JAN-1908 1,464.000000 04-JAN-1904
02-JAN-2004 36,526.000000 02-JAN-2000
12-AUG-2005 37,114.000000 12-AUG-2001
21-OCT-2005 37,184.000000 21-OCT-2001
21-OCT-2005 37,184.000000 37183 20/10/2001 21-OCT-2001
09-NOV-2005 37,203.000000 37202 08/11/2001 09-NOV-2001
17-MAY-2006 37,392.000000 17-MAY-2002
17-MAY-2006 37,392.000000 37391 16/05/2002 17-MAY-2002
20-OCT-2009 38,644.000000 20-OCT-2005
Number of cases read: 19 Number of cases listed: 19
......................
(1) Reference for myself: Programs and test data for this posting have
names beginning (or consisting of)
2005-10-19 Peck - Date question