LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2007, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sun, 7 Jan 2007 23:17:13 -0500
Reply-To:   "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject:   Re: Locating the Highest/lowest Value for an Observation

On Sat, 6 Jan 2007 16:04:44 -0500, Paul Dorfman <sashole@BELLSOUTH.NET> wrote:

>Howard, > >Since you suggested it and urged me on offline, here is one way: > >data try ; >input >ohecode: $8. time: time8. ohebs:$1. lo_price B1pr S1pr ; >cards ; >DGAE709 08:07:48 B 122.50 122.50 123.00 >TANA05T 08:07:50 B 121.50 122.50 123.00 >SGTIA05T 08:07:52 S 122.75 123.50 123.75 >GJCXEUT 08:07:55 B 122.00 121.00 121.50 >DGAE709 08:08:00 B 122.50 122.75 123.00 >SGTIA05T 08:08:10 S 122.75 121.75 122.00 >GUOLS5T 08:08:15 S 123.00 122.75 123.00 >TANA05T 08:09:50 B 121.50 122.75 123.00 >run ; > >data out (drop = _: b1pr s1pr) ; > array tbs [0:1, 0:86400] _temporary_ ; > > dcl hash ohecd_tm (hashexp: 16) ; > ohecd_tm.definekey ('ohecode') ; > ohecd_tm.definedata ('time_out') ; > ohecd_tm.definedone () ; > > do until (z1) ; > set try (rename = (time = time_out)) end = z1 ; > tbs [0, time_out] = b1pr ; > tbs [1, time_out] = s1pr ; > ohecd_tm.replace() ; > end ; > > do until (z2) ; > set try (rename = (time = time_in)) end = z2 ; > if ohecd_tm.find() ne 0 then continue ; > _x = indexc ('S', ohebs) ; > _f = _x * 2 - 1 ; > best_price = lo_price ; > do _t = time_in to time_out ; > _bs = tbs [_x, _t] ; > if _bs = . or _f * _bs => _f * best_price then continue ; > time_event = _t ; > best_price = _bs ; > end ; > output ; > ohecd_tm.remove() ; > end ; > format time: time8. ; > stop ; >run ; > >Many variations on this theme are possible. The above implies that the file >is sorted by time, but provisions can be easily made in the code to render >it unnecessary. The temp array is superquick to search since it is a >key-indexed table, but at the expense of wasting some time jumping over >empty buckets. It should not matter much here, for the events are quite >close. Instead, the array can squeezed into another hash keyed by time, but >then instead of iterating through the array looking for an extremum , we >would have an extra overhead of iterating through the hash. Which is >faster, should be tested, which I would gladly do, but unfortunately, I am >running against a pretty tight zeitnot. Flip-flopping the inequality could >be done less job-secure and elegantly than via the sign-factor _F, and so on. > >Kind regards >------------- >Paul Dorfman >Jax, FL >-------------

I'm afraid that only after looking at Paul's code did I recognize an issue which should have been raised earlier. The example from Dorian suggests that each observation has a distinct TIME value. However, there are only 86,400 seconds in a day, and Dorian says he has 400K observations. So it seems that either there are multiple observations with identical timestamps, or the series is not contained within a single day.

Here is a somewhat abridged version of the back-thread.

Original post:

On Thu, 28 Dec 2006 02:50:54 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:

>Good morning, > > >I'm in need of some urgent advice on a SAS coding problem. I am trying >to locate the highest/lowest price (and associated event time) that >existed during the lifespan of an order. This is determined by >comparing the price of the order to those that occurred during the life >of the order and is done separately for two types of orders. > >Consider the following event dataset: > >Ohecode Time ohebs lo_price B1pr S1pr > >DGAE709 08:07:48 B 122.50 122.50 123.00 >TANA05T 08:07:50 B 121.50 122.50 123.00 >SGTIA05T 08:07:52 S 122.75 123.50 123.75 >GJCXEUT 08:07:55 B 122.00 121.00 121.50 >DGAE709 08:08:00 B 122.50 122.75 123.00 >SGTIA05T 08:08:10 S 122.75 121.75 122.00 >GUOLS5T 08:08:15 S 123.00 122.75 123.00 >TANA05T 08:09:50 B 121.50 122.75 123.00 > >I seek the following solution: > >Ohecode Time Best_price Event_time > >GJCXEUT 08:07:55 122.00 08:07:55 >DGAE709 08:08:00 123.50 08:07:52 >SGTIA05T 08:08:10 121.50 08:07:55 >GUOLS5T 08:08:15 123.00 08:08:15 >TANA05T 08:09:50 123.50 08:07:52 > >Please note that the "best_price" for each order is derived as >follows: > >If ohebs = 'B' then best_price (and associated event_time) for the >particular order is: > > First.time LE max(lo_price, B1pr) LE last.time. > >Else if ohebs = 'S' then best_price (and associated event_time) for >the particular order is: > > First.time LE min(lo_price, S1pr) LE last.time. > >Thanks for the assistance. > >Take care. > >Dorian

Later Dorian clarified, and among other things indicated that he did not really mean "first.time" and "last.time".

On Thu, 28 Dec 2006 05:03:35 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:

>Thanks for your interest and your comment is valid. > >I thought the logical expression will be suffice to explain the final >result I seek but it is confusing and need further clarification. > >Each order in the dataset has an entry and exit time from the sample >and thus, the the used of "first.time" and "last.time" in the logical >expression. What is required is a comparison of its price to the best >available prices that existed during this interval (betwen the entry >and exit times) and then record the highest price value and the >associated time of the event. > >For instance, order DGAE709 is of type 'B' and has an entry and exit >time of 08:07:48 and 08:08:00, respectively. We need to compare its >price ("lo_price") 122.50 with prices "B1pr" during its time spent in >the sample that is, between 08:07:48 and 08:08:00 and choose the >highest price value from this comparison and record the associated >event time ("event_time") the highest price value occurred. In this >case, this occurred at 08:07:52 when b1pr was 123.50. > >For order of teh type 'S', we choose the lowest price value by >comparing the price ("lo_price") of the order to "S1pr". > >I hope this clear up things a bit. > >Once again thanks for your interest shown. > >Take care. > >Dorian

Another clarification from the original poster:

On Thu, 28 Dec 2006 10:07:51 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:

>Thanks for the assistance but I don't think your code will generate the >desired results. > >The solution requires that the price of an order is to be compared with >all prices from its time of entry to the time of exit from the sample. > >Let's take for example; DGAE709. The order entry is 08:07:48 and exit >on 08:08:00. Its price is 122.50 and its type is 'B'. > >What is required then is that its price 122.50 is to be compared with >prices in the column B1pr between 08:07:48 and 08:08:00, returning the >highest price value. > >For completeness: > > 08:07:48 Max( 122.50, 122.50) DGAE709's entry time > 08:07:50 Max( 122.50, 122.50) > 08:07:52 Max( 122.50, 123.50) > 08:07:55 Max( 123.50, 121.50) > 08:08:00 Max( 123.50, 122.75) DGAE709's exit time > >output dataset: > >Ohecode Time Best_price Event_time > >DGAE709 08:08:00 123.50 08:07:52 > > >For order of the type 'S', the solution, using an example, is as >follows: > >Comparing the order's price with prices in the column S1pr: > >08:07:52 Min(122.75, 123.75) SGTIA05T's time of entry >08:07:55 Min(122.75, 121.50) >08:08:00 Min(121.50, 123.00) >08:08:10 Min(121.50, 122.00) SGTIA05T's time of exit > >output > >Ohecode Time Best_price Event_time > >DGAE709 08:08:00 123.50 08:07:52 >SGTIA05T 08:08:10 121.50 08:07:55 > >Thanks once again for your assistance. > >Dorian

My suggestion:

On Thu, 28 Dec 2006 22:24:38 -0500, Howard Schreier <hs AT dc-sug DOT org> <nospam@HOWLES.COM>

wrote:

>Maybe a reflexive join ... > >data try; >input >Ohecode : $10. Time : time8. ohebs:$1. lo_price:8. B1pr:8. S1pr:8.; >format time time8.; >cards; >DGAE709 08:07:48 B 122.50 122.50 123.00 >TANA05T 08:07:50 B 121.50 122.50 123.00 >SGTIA05T 08:07:52 S 122.75 123.50 123.75 >GJCXEUT 08:07:55 B 122.00 121.00 121.50 >DGAE709 08:08:00 B 122.50 122.75 123.00 >SGTIA05T 08:08:10 S 122.75 121.75 122.00 >GUOLS5T 08:08:15 S 123.00 122.75 123.00 >TANA05T 08:09:50 B 121.50 122.75 123.00 >; > > proc sql; > select Ohecode > , ohebs > , lo_price > , entry > , exit > , Event_time > , case when ohebs='B' then b1pr > when ohebs='S' then s1pr > else . > end as Best_price > from > (select time as Event_time, b1pr, s1pr from try) > natural join > (select ohecode > , ohebs > , lo_price > , min(time) as entry format=time8. > , max(time) as exit format=time8. > from try > group by ohecode, ohebs, lo_price > ) > where entry le Event_time le exit > group by ohecode, ohebs, lo_price > having case when ohebs='B' then b1pr=max(b1pr) > when ohebs='S' then s1pr=min(s1pr) > else . > end > ; > >Result is > > Ohecode ohebs lo_price entry exit Event_time Best_price > ----------------------------------------------------------------------- > DGAE709 B 122.5 8:07:48 8:08:00 8:07:52 123.5 > GJCXEUT B 122 8:07:55 8:07:55 8:07:55 121 > GUOLS5T S 123 8:08:15 8:08:15 8:08:15 123 > SGTIA05T S 122.75 8:07:52 8:08:10 8:07:55 121.5 > TANA05T B 121.5 8:07:50 8:09:50 8:07:52 123.5

Other people posted suggestions (SQL-based and otherwise). However, I think only my code was confirmed by Dorian as producing the expected results.

On Fri, 29 Dec 2006 08:43:59 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:

>Howard, > >Your code gives the correct solution to my problem and thus, you have a >fair understanding of the problem at hand. > >However, your solution may takes quite some time to run for a large >sample. I'm looking for a reasonably fast code beacuse of the size of >the dataset I'm working with (400K observations). > >Is there a faster way to getting the output? > >Thanks for your assistance. > >Dorian


Back to: Top of message | Previous page | Main SAS-L page