Date: Fri, 7 Nov 2008 10:24:27 -0500 Gerhard Hellriegel "SAS(r) Discussion" Gerhard Hellriegel Re: Scary date9. informat and how to judge if a string is a date?

I don't know that certain rules, so I could not say, if a given string might be read as date or not. I can only try to interpret it if I see the results:

12 2 3 oct08 02OCT2008 seems like SAS takes the first number which is identifyable as such, that's 2 and ignores the 3. Sounds like: "search the first num part. If found, search for the first char and pick out 3 chars, starting with the first for the month. Then look for the next number and test if it is in the range between 0 and 99 (two places) or >1533 (not really sure, around that the Gregorian calender starts, below that the date is not valid. So there is no date 01jan188 and no date 01jan1299 in SAS, but there is da date 01jan1888). If two places, test if it is below the yearcutoff. If yes, add 2000, if no, add 1900.

16 0 3oct08 . the rule "take the first number and then go to the first char" leads to 0oct08, which is no valid date (there is no day 0).

I was kind of surprised to see 3oct8 is actually considered legatimate date, also '3oct18 8'. Let's see: 3oct8: take the first number = 3, interpret the next 3 chars as month = oct and the next number = 8 as year. Year is less than 1920 (yearcutoff), so add 2000 >>>> 03oct2008 3oct18 8: first number=3, month (next three char)=oct, next number=18. Date is ready, the rest is to be ignored. 18 is less than yearcutoff, so add 2000 ==>> 03oct2018 (right? I did not look at the result while writing that, I promise!)

Also the last two: 22 3oct1XX8 03OCT2001 23 3oct1XX89 03OCT2001 first num=3, month=oct, next num=1, rest is to be ignored.

Does that make sense? So it seems like the rule is not as simple as "ignore leading and trailing blanks", also other things which are in between (blanks, numbers where chars are needed, are ignored and the trailing things after finding the trailing number (which is always something without any delimiters in between, so 188 is 188 and not 18!)

No it rests the thing with the Gregorian calender:

data test; do i=1520 to 1600; date=mdy(1,1,i); put i= date date9.; end; run;

shows you, that it starts around 1582. So 188 is not a valid year.

After finding that out, you can say, which dates are legal and which are not:

data test; date="31dec1581"d; put date date9.; date="01jan1582"d; put date date9.; date="31dec99"d; put date date9.; date=date+1; put date date9.; date="01jan100"d; put date date9.; run;

Gerhard

On Wed, 5 Nov 2008 17:06:25 -0500, Ya Huang <ya.huang@AMYLIN.COM> wrote:

>Hi there, > >This one has been discussed before (sort of), but I was never really >satisfied by the explanation. So I'd like to bring this up again, hoping >to get a better understanding of the situation. > >I was under the impression that when using date9. informat to >convert a string to SAS date, SAS compress out the space first, >then follow certain rule to determine whether the rest part is >a date. This "certain rule" is what puzzled me. From the following >code, and the result, I couldn't get the clear picture of the rules. > >data test; >length dt \$9; >dt=' 23oct08'; output; >dt=' 23oct08 '; output; >dt='23oct08 '; output; >dt='23 oct08 '; output; >dt='23 oct08'; output; >dt='23oct 08'; output; >dt='23oct 08 '; output; >dt='3oct 08 '; output; >dt='3 oct 8'; output; >dt=' 3 oct 8 '; output; >dt=' 3oct8 '; output; >dt='2 3 oct08'; output; >dt='2 3oct08'; output; >dt='3 0oct08'; output; >dt='3 0 oct08'; output; >dt='0 3oct08'; output; >dt=' 3oct 2 8'; output; >dt='3 oct 2 8'; output; >dt='3 oct 28 '; output; >dt='3oct18 8 '; output; >dt='3oct 188 '; output; >dt='3oct1XX8 '; output; >dt='3oct1XX89'; output; >run; > >data test; > set test; >sasdt=input(dt,date9.); >format dt \$char9. sasdt date9.; >run; > >proc print; >title "Yearcutoff = %sysfunc(getoption(yearcutoff))"; >run; > >Yearcutoff = 1920 > >Obs dt sasdt > > 1 23oct08 23OCT2008 > 2 23oct08 23OCT2008 > 3 23oct08 23OCT2008 > 4 23 oct08 23OCT2008 > 5 23 oct08 23OCT2008 > 6 23oct 08 23OCT2008 > 7 23oct 08 23OCT2008 > 8 3oct 08 03OCT2008 > 9 3 oct 8 03OCT2008 > 10 3 oct 8 03OCT2008 > 11 3oct8 03OCT2008 > 12 2 3 oct08 02OCT2008 > 13 2 3oct08 02OCT2008 > 14 3 0oct08 03OCT2008 > 15 3 0 oct08 03OCT2008 > 16 0 3oct08 . > 17 3oct 2 8 03OCT2002 > 18 3 oct 2 8 03OCT2002 > 19 3 oct 28 03OCT1928 > 20 3oct18 8 03OCT2018 > 21 3oct 188 . > 22 3oct1XX8 03OCT2001 > 23 3oct1XX89 03OCT2001 > > >Line 1-11 seems to show that SAS 'compress out' the space first, no matter >it's leading space or trailing space, or even space in between. >But line 12-21 shows that is not that always true. For example, why line 12 >is not 23OCT2008? Why line 16 is not 03OCT2008 (this one actually >give error message in log). > >I was kind of surprised to see 3oct8 is actually considered legatimate >date, also '3oct18 8'. > >All this make my goal a bit hard to reach, i.e., what string is considered >a date (ddmmmyyy type)? > >Thanks for the comments. > >Ya

Back to: Top of message | Previous page | Main SAS-L page