|Date: ||Fri, 7 Nov 2008 10:24:27 -0500|
|Reply-To: ||Gerhard Hellriegel <gerhard.hellriegel@T-ONLINE.DE>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Gerhard Hellriegel <gerhard.hellriegel@T-ONLINE.DE>|
|Subject: ||Re: Scary date9. informat and how to judge if a string is a date?|
I don't know that certain rules, so I could not say, if a given string
might be read as date or not. I can only try to interpret it if I see the
12 2 3 oct08 02OCT2008
seems like SAS takes the first number which is identifyable as such,
that's 2 and ignores the 3. Sounds like: "search the first num part. If
found, search for the first char and pick out 3 chars, starting with the
first for the month. Then look for the next number and test if it is in
the range between 0 and 99 (two places) or >1533 (not really sure, around
that the Gregorian calender starts, below that the date is not valid. So
there is no date 01jan188 and no date 01jan1299 in SAS, but there is da
date 01jan1888). If two places, test if it is below the yearcutoff. If
yes, add 2000, if no, add 1900.
16 0 3oct08 .
the rule "take the first number and then go to the first char" leads to
0oct08, which is no valid date (there is no day 0).
I was kind of surprised to see 3oct8 is actually considered legatimate
date, also '3oct18 8'.
3oct8: take the first number = 3, interpret the next 3 chars as month =
oct and the next number = 8 as year. Year is less than 1920 (yearcutoff),
so add 2000 >>>> 03oct2008
3oct18 8: first number=3, month (next three char)=oct, next number=18.
Date is ready, the rest is to be ignored. 18 is less than yearcutoff, so
add 2000 ==>> 03oct2018 (right? I did not look at the result while writing
that, I promise!)
Also the last two:
22 3oct1XX8 03OCT2001
23 3oct1XX89 03OCT2001
first num=3, month=oct, next num=1, rest is to be ignored.
Does that make sense?
So it seems like the rule is not as simple as "ignore leading and trailing
blanks", also other things which are in between (blanks, numbers where
chars are needed, are ignored and the trailing things after finding the
trailing number (which is always something without any delimiters in
between, so 188 is 188 and not 18!)
No it rests the thing with the Gregorian calender:
do i=1520 to 1600;
put i= date date9.;
shows you, that it starts around 1582. So 188 is not a valid year.
After finding that out, you can say, which dates are legal and which are
date="31dec1581"d; put date date9.;
date="01jan1582"d; put date date9.;
date="31dec99"d; put date date9.;
date=date+1; put date date9.;
date="01jan100"d; put date date9.;
On Wed, 5 Nov 2008 17:06:25 -0500, Ya Huang <ya.huang@AMYLIN.COM> wrote:
>This one has been discussed before (sort of), but I was never really
>satisfied by the explanation. So I'd like to bring this up again, hoping
>to get a better understanding of the situation.
>I was under the impression that when using date9. informat to
>convert a string to SAS date, SAS compress out the space first,
>then follow certain rule to determine whether the rest part is
>a date. This "certain rule" is what puzzled me. From the following
>code, and the result, I couldn't get the clear picture of the rules.
>length dt $9;
>dt=' 23oct08'; output;
>dt=' 23oct08 '; output;
>dt='23oct08 '; output;
>dt='23 oct08 '; output;
>dt='23 oct08'; output;
>dt='23oct 08'; output;
>dt='23oct 08 '; output;
>dt='3oct 08 '; output;
>dt='3 oct 8'; output;
>dt=' 3 oct 8 '; output;
>dt=' 3oct8 '; output;
>dt='2 3 oct08'; output;
>dt='2 3oct08'; output;
>dt='3 0oct08'; output;
>dt='3 0 oct08'; output;
>dt='0 3oct08'; output;
>dt=' 3oct 2 8'; output;
>dt='3 oct 2 8'; output;
>dt='3 oct 28 '; output;
>dt='3oct18 8 '; output;
>dt='3oct 188 '; output;
>dt='3oct1XX8 '; output;
> set test;
>format dt $char9. sasdt date9.;
>title "Yearcutoff = %sysfunc(getoption(yearcutoff))";
>Yearcutoff = 1920
>Obs dt sasdt
> 1 23oct08 23OCT2008
> 2 23oct08 23OCT2008
> 3 23oct08 23OCT2008
> 4 23 oct08 23OCT2008
> 5 23 oct08 23OCT2008
> 6 23oct 08 23OCT2008
> 7 23oct 08 23OCT2008
> 8 3oct 08 03OCT2008
> 9 3 oct 8 03OCT2008
> 10 3 oct 8 03OCT2008
> 11 3oct8 03OCT2008
> 12 2 3 oct08 02OCT2008
> 13 2 3oct08 02OCT2008
> 14 3 0oct08 03OCT2008
> 15 3 0 oct08 03OCT2008
> 16 0 3oct08 .
> 17 3oct 2 8 03OCT2002
> 18 3 oct 2 8 03OCT2002
> 19 3 oct 28 03OCT1928
> 20 3oct18 8 03OCT2018
> 21 3oct 188 .
> 22 3oct1XX8 03OCT2001
> 23 3oct1XX89 03OCT2001
>Line 1-11 seems to show that SAS 'compress out' the space first, no matter
>it's leading space or trailing space, or even space in between.
>But line 12-21 shows that is not that always true. For example, why line
>is not 23OCT2008? Why line 16 is not 03OCT2008 (this one actually
>give error message in log).
>I was kind of surprised to see 3oct8 is actually considered legatimate
>date, also '3oct18 8'.
>All this make my goal a bit hard to reach, i.e., what string is considered
>a date (ddmmmyyy type)?
>Thanks for the comments.