LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 20 Oct 2005 13:05:52 -0400
Reply-To:     Richard Ristow <wrristow@mindspring.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@mindspring.com>
Subject:      Excel dates, again (was, re: Date question...)
Comments: To: "Peck, Jon" <peck@spss.com>
Comments: cc: Luca Meyer <lucameyer@TISCALI.IT>,
          Patricia Huff <PatriciaHuff8@cox.net>
In-Reply-To:  <5CFEFDB5226CB54CBB4328B9563A12EE02EAB404@hqemail2.spss.com >
Content-Type: text/plain; charset="us-ascii"; format=flowed

Follow-up to posting Date: Wed, 19 Oct 2005 04:20:37 +0200 From: Luca Meyer <lucameyer@tiscali.it> Subject: Date question: I know this is quite simple...

Luca Meyer wrote, "I have dates as number, the difference between date of interested and January 1st, 1900." I posted a (correct) answer to that inquiry. (However, Luca, check that you question is right; see below.)

I added the *obiter dictum* "You're converting Excel dates, right? January 1st, 1900 is day 1, not day 0". That led Jon Peck of SPSS, Inc., to write off-list with some important points about Excel dates INCLUDING TWO "GOTCHAs" These and other points about Excel dates may be found on http://www.cpearson.com/excel/datetime.htm, which Jon pointed me to.

(Excel dates are a useful topic, if we -- I -- get going on that SPSS FAQs list we were discussing in September. In fact, date-time representations more broadly: SPSS, SAS, Excel, Access. Any others?)

At 09:05 AM 10/19/2005, Peck, Jon wrote: >A. If you do in fact have these date values in Excel, if you apply a >date format to the column in Excel, SPSS will automatically interpret >these correctly when you read in the spreadsheet.

If you have an SPSS file with integer dates, you're stuck with using SPSS conversion code. However, more commonly you're reading an Excel spreadsheet directly to SPSS, and THIS IS THE WAY TO GO. As I'll show below, it sidesteps both forms of the "gotcha."(1)

But if you do have integer dates from Excel, B. FIRST "GOTCHA":

After the first two months of the representable date range, Excel dates are represented as if January 1, 1900 is day TWO, and the correct conversion is

COMPUTE DATE_CVT = DATE.MDY(01,01,1900) + TIME.DAYS(XCL_DATE) - TIME.DAYS(2). or COMPUTE DATE_CVT = DATE.MDY(01,01,1900) + TIME.DAYS(XCL_DATE-2).

(The Excel 97 Help files, in "Tips on entering dates and times", state, "days are numbered from the beginning of the century; the date serial number 1 corresponds to the date January 1, 1900"; that is, they are WRONG.)

C. SECOND "GOTCHA":

Excel can also represent dates as a day count where 01-JAN-1904 is day 0. (Menu Tools/Options, tab Calculation, section Workbook Options, select "1904 date system".) Fortunately,

1. This is rarely used 2. SPSS still reads date-formatted columns correctly

(Two representations for the same quantity, especially two SIMILAR representations, is almost too horrible to contemplate.)

Extended discussion, and test runs ---------------------------------- From Jon Peck and the cited Web page, >The integer portion of the [Excel date], ddddd, represents the number >of days since 1900-Jan-0. The number 1 represents 1900-Jan-1. >[However, the Excel date] number is one greater than the actual number >of days [from with 1900-Jan-1]. This is because Excel behaves as if >the date 1900-Feb-29 existed. It did not.

>This is by design. It was truly a bug in Lotus 123. When Excel was >introduced, 123 had nearly the entire market for spreadsheet >software. Microsoft decided to continue Lotus' bug, in order to be >fully compatible.

(In days gone by, system programmers would report problems with OS/360, etc., to IBM and sometimes get the answer, "Working as designed." For crazy ones, they coined the phrase "broken as designed." It's applicable here.)

Test run; SPSS draft output. The first three columns are read from Excel; the fourth is calculated. The first column is Excel date-formatted; the second is the same value (via a formula), but number-formatted. (The second 28-Feb-1900 is Excel's "February 29, 1900.") The conversion formula is the one I wrote for Luca Meyer; variable "Meyer" is Luca Meyer's integer values and desired dates (dd/mm/yyyy). Converted dates are correct through February 1900, but one day too late thereafter.

GET TRANSLATE FILE=XCL_DATE /TYPE=XLS /FIELDNAMES /RANGE=a11:c30.

COMPUTE DATE_CVT = DATE.MDY(01,01,1900) + TIME.DAYS(DATE#) - TIME.DAYS(1). FORMATS DATE_CVT (DATE11). VAR WIDTH DATE_CVT (11). LIST. |---------------------------|-----------------------| |Output Created |20 Oct 05 11:44:21 | |---------------------------|-----------------------| DATE DATE# MEYER DATE_CVT

. -1.000000 ----|---10----|---20 30-DEC-1899 . .000000 31-DEC-1899 01-JAN-1900 1.000000 01-JAN-1900 01-JAN-1900 1.000000 01-JAN-1900 28-FEB-1900 59.000000 28-FEB-1900 28-FEB-1900 60.000000 01-MAR-1900 01-MAR-1900 61.000000 02-MAR-1900 30-MAR-1900 90.000000 31-MAR-1900 01-JAN-1904 1,462.000000 02-JAN-1904 02-JAN-1904 1,463.000000 03-JAN-1904 03-JAN-1904 1,464.000000 04-JAN-1904 01-JAN-2000 36,526.000000 02-JAN-2000 11-AUG-2001 37,114.000000 12-AUG-2001 20-OCT-2001 37,184.000000 21-OCT-2001 20-OCT-2001 37,184.000000 37183 20/10/2001 21-OCT-2001 08-NOV-2001 37,203.000000 37202 08/11/2001 09-NOV-2001 16-MAY-2002 37,392.000000 17-MAY-2002 16-MAY-2002 37,392.000000 37391 16/05/2002 17-MAY-2002 19-OCT-2005 38,644.000000 20-OCT-2005

Number of cases read: 19 Number of cases listed: 19

>D. Excel actually has two different systems of date values. The >default on Windows is as Richard described [with the correction], but >the Macintosh default, which is also available on Windows if you >choose it, starts in 1904, so the meaning of the day count would be >different.

This is the same run, but with "1904 date system" in the spreadsheet. The first, date-formatted, column is read as presented in Excel; the converted dates are exactly four years too low (after 29 Feb 1904).

* ------------------------------------------------- * Now, run with spreadsheet set to 1904 date system * -------------------------------------------------

GET TRANSLATE FILE=XCL_DATE /TYPE=XLS /FIELDNAMES /RANGE=a11:c30.

COMPUTE DATE_CVT = DATE.MDY(01,01,1900) + TIME.DAYS(DATE#) - TIME.DAYS(1). FORMATS DATE_CVT (DATE11). VAR WIDTH DATE_CVT (11). LIST. |---------------------------|-----------------------| |Output Created |20 Oct 05 11:46:42 | |---------------------------|-----------------------| DATE DATE# MEYER DATE_CVT

. -1.000000 ----|---10----|---20 30-DEC-1899 01-JAN-1904 .000000 31-DEC-1899 02-JAN-1904 1.000000 01-JAN-1900 02-JAN-1904 1.000000 01-JAN-1900 29-FEB-1904 59.000000 28-FEB-1900 01-MAR-1904 60.000000 01-MAR-1900 02-MAR-1904 61.000000 02-MAR-1900 31-MAR-1904 90.000000 31-MAR-1900 02-JAN-1908 1,462.000000 02-JAN-1904 03-JAN-1908 1,463.000000 03-JAN-1904 04-JAN-1908 1,464.000000 04-JAN-1904 02-JAN-2004 36,526.000000 02-JAN-2000 12-AUG-2005 37,114.000000 12-AUG-2001 21-OCT-2005 37,184.000000 21-OCT-2001 21-OCT-2005 37,184.000000 37183 20/10/2001 21-OCT-2001 09-NOV-2005 37,203.000000 37202 08/11/2001 09-NOV-2001 17-MAY-2006 37,392.000000 17-MAY-2002 17-MAY-2006 37,392.000000 37391 16/05/2002 17-MAY-2002 20-OCT-2009 38,644.000000 20-OCT-2005

Number of cases read: 19 Number of cases listed: 19

...................... (1) Reference for myself: Programs and test data for this posting have names beginning (or consisting of) 2005-10-19 Peck - Date question


Back to: Top of message | Previous page | Main SPSSX-L page