LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2000, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 8 Dec 2000 20:52:53 -0000
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject:      Re: Afternoon Performance
Comments: To: Howard_Schreier@ITA.DOC.GOV
Content-Type: text/plain; format=flowed

Howard,

Interestingly, after the format has been compiled, the data step will run of course faster (4.98 sec if run against my test data I have posted earlier today) than the entire Sig's SQL solution (11 sec). However, it takes 4.31 sec to create the format control file and 2.01 seconds to compile the format, which if combined makes the gains seem less impressive. I agree of course that the format method avoids the key range limitation, but instead it introduces one of its own - the cardinality of the key - which, in a vicious loop, is tied back to the nature of SAS date as a limited range integer. If the key were of some other type, say representing telephone numbers, the lookup table fed into the format could be so large that the format would either fail to compile having run out of memory or be compiling forever.

Kind regards, ===================== Paul M. Dorfman Jacksonville, Fl =====================

>From: Howard Schreier <Howard_Schreier@ITA.DOC.GOV> >Reply-To: Howard Schreier <Howard_Schreier@ITA.DOC.GOV> >To: SAS-L@LISTSERV.UGA.EDU >Subject: Re: Afternoon Performance >Date: Fri, 8 Dec 2000 15:15:25 -0500 > >Here's another variation, one which relies somewhat less on the SQL >optimizer and which makes no assumptions about the upper and lower bounds >for DATE. > > proc sql; > create table forfmt as > select distinct 'is_am' as fmtname, date as start, 'Yes' as label > from [whatever] (where = (time<43200)) > quit; > > proc format cntlin=forfmt; run; > > data subset; > set [whatever]; > if put(date,is_am.) ne 'Yes'; > run; > >Requiring that all of a day's events fall after noon is the same as >requiring that none fall before noon. It's easier to work with the latter >because encountering the first exception (morning timestamp) decides the >issue for that date. > >It's not clear whether the data set has 200+ M observations or 200+ M >bytes, >but in either case the data reduction caused by the DISTINCT keyword should >be great. After all, there are only 36+ K days in a century. > >43,200 is the number of seconds in 12 hours. > >On Fri, 8 Dec 2000 09:11:03 -0800, ludwig boltzmann <lboltzmann@MAIL.NU> >wrote: > > >Hello Sas-l: > > > >I'm in need of an expert 'performance' advice. I have a _huge_ (of the >order of 200-300m) SAS data set comprising certain 'events'. The events are >defined by DATE and TIME, both in the forms of SAS date and time. For a >sample, it might look like this: > > > > DATE TIME > > -3652 32809 > > 12998 32795 > > 10223 39481 > > -3652 52551 > > -877 46762 > > 10223 46764 > > 1898 68817 > > 12998 58839 > > 4673 46200 > > 7448 71080 > > -3652 60625 > > 12998 80787 > > 4673 63297 > > 4673 77658 > > 10223 56646 > > -877 59547 > > 7448 82521 > > 12998 46852 > > -877 33790 > > -3652 81979 > > -877 68913 > > > >Apparently there are more than 1 event per day. I need to select only >those >days when _all_ events have occurred after noon, i.e. 12:00:00. In other >words, if the initial data set was sorted by DATE TIME, formatted, and >printed it would look like > > > > DATE TIME > >1950-01-01 9:06:49 > > 14:35:51 > > 16:50:25 > > 22:46:19 > >1957-08-07 9:23:10 > > 12:59:22 > > 16:32:27 > > 19:08:33 > >1965-03-13 19:06:57 > >1972-10-17 12:50:00 > > 17:34:57 > > 21:34:18 > >1980-05-23 19:44:40 > > 22:55:21 > >1987-12-28 10:58:01 > > 12:59:24 > > 15:44:06 > >1995-08-03 9:06:35 > > 13:00:52 > > 16:20:39 > > 22:26:27 > > > >Out of 7 days above only the following groups would qualify: > > > > DATE TIME > >1980-05-23 19:44:40 > > 22:55:21 > >1972-10-17 12:50:00 > > 17:34:57 > > 21:34:18 > >1965-03-13 19:06:57 > > > >If the input was sorted like that a solution would be more or less >obvious >(and I would not be asking an advice). The problem is, it is _not_ sorted >by >DATE TIME (instead, it is ordered by some alternate key not related to this >problem), and because of the size of the data I want to avoid sorting >altogether. Moreover, the resuling subset needs to remain in the original >relative order. So with the sample like up above the actual output I need >would be: > > > >DATE TIME > >1898 68817 > >4673 46200 > >7448 71080 > >4673 63297 > >4673 77658 > >7448 82521 > > > >It doesn't really matter if it is SQL or anything else - the only goal >here >is high performance. Thanks in advance for any ideas. > >- Ludwig > > > > > > > >_____________________________________________________________ > >Get Your Own .NU Web Address Now! ---> http://WWW.NUNAMES.NU

_____________________________________________________________________________________ Get more from the Web. FREE MSN Explorer download : http://explorer.msn.com


Back to: Top of message | Previous page | Main SAS-L page