|
On Sat, 6 Jan 2007 16:04:44 -0500, Paul Dorfman <sashole@BELLSOUTH.NET> wrote:
>Howard,
>
>Since you suggested it and urged me on offline, here is one way:
>
>data try ;
>input
>ohecode: $8. time: time8. ohebs:$1. lo_price B1pr S1pr ;
>cards ;
>DGAE709 08:07:48 B 122.50 122.50 123.00
>TANA05T 08:07:50 B 121.50 122.50 123.00
>SGTIA05T 08:07:52 S 122.75 123.50 123.75
>GJCXEUT 08:07:55 B 122.00 121.00 121.50
>DGAE709 08:08:00 B 122.50 122.75 123.00
>SGTIA05T 08:08:10 S 122.75 121.75 122.00
>GUOLS5T 08:08:15 S 123.00 122.75 123.00
>TANA05T 08:09:50 B 121.50 122.75 123.00
>run ;
>
>data out (drop = _: b1pr s1pr) ;
> array tbs [0:1, 0:86400] _temporary_ ;
>
> dcl hash ohecd_tm (hashexp: 16) ;
> ohecd_tm.definekey ('ohecode') ;
> ohecd_tm.definedata ('time_out') ;
> ohecd_tm.definedone () ;
>
> do until (z1) ;
> set try (rename = (time = time_out)) end = z1 ;
> tbs [0, time_out] = b1pr ;
> tbs [1, time_out] = s1pr ;
> ohecd_tm.replace() ;
> end ;
>
> do until (z2) ;
> set try (rename = (time = time_in)) end = z2 ;
> if ohecd_tm.find() ne 0 then continue ;
> _x = indexc ('S', ohebs) ;
> _f = _x * 2 - 1 ;
> best_price = lo_price ;
> do _t = time_in to time_out ;
> _bs = tbs [_x, _t] ;
> if _bs = . or _f * _bs => _f * best_price then continue ;
> time_event = _t ;
> best_price = _bs ;
> end ;
> output ;
> ohecd_tm.remove() ;
> end ;
> format time: time8. ;
> stop ;
>run ;
>
>Many variations on this theme are possible. The above implies that the file
>is sorted by time, but provisions can be easily made in the code to render
>it unnecessary. The temp array is superquick to search since it is a
>key-indexed table, but at the expense of wasting some time jumping over
>empty buckets. It should not matter much here, for the events are quite
>close. Instead, the array can squeezed into another hash keyed by time, but
>then instead of iterating through the array looking for an extremum , we
>would have an extra overhead of iterating through the hash. Which is
>faster, should be tested, which I would gladly do, but unfortunately, I am
>running against a pretty tight zeitnot. Flip-flopping the inequality could
>be done less job-secure and elegantly than via the sign-factor _F, and so on.
>
>Kind regards
>-------------
>Paul Dorfman
>Jax, FL
>-------------
I'm afraid that only after looking at Paul's code did I recognize an issue
which should have been raised earlier. The example from Dorian suggests that
each observation has a distinct TIME value. However, there are only 86,400
seconds in a day, and Dorian says he has 400K observations. So it seems that
either there are multiple observations with identical timestamps, or the
series is not contained within a single day.
Here is a somewhat abridged version of the back-thread.
Original post:
On Thu, 28 Dec 2006 02:50:54 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:
>Good morning,
>
>
>I'm in need of some urgent advice on a SAS coding problem. I am trying
>to locate the highest/lowest price (and associated event time) that
>existed during the lifespan of an order. This is determined by
>comparing the price of the order to those that occurred during the life
>of the order and is done separately for two types of orders.
>
>Consider the following event dataset:
>
>Ohecode Time ohebs lo_price B1pr S1pr
>
>DGAE709 08:07:48 B 122.50 122.50 123.00
>TANA05T 08:07:50 B 121.50 122.50 123.00
>SGTIA05T 08:07:52 S 122.75 123.50 123.75
>GJCXEUT 08:07:55 B 122.00 121.00 121.50
>DGAE709 08:08:00 B 122.50 122.75 123.00
>SGTIA05T 08:08:10 S 122.75 121.75 122.00
>GUOLS5T 08:08:15 S 123.00 122.75 123.00
>TANA05T 08:09:50 B 121.50 122.75 123.00
>
>I seek the following solution:
>
>Ohecode Time Best_price Event_time
>
>GJCXEUT 08:07:55 122.00 08:07:55
>DGAE709 08:08:00 123.50 08:07:52
>SGTIA05T 08:08:10 121.50 08:07:55
>GUOLS5T 08:08:15 123.00 08:08:15
>TANA05T 08:09:50 123.50 08:07:52
>
>Please note that the "best_price" for each order is derived as
>follows:
>
>If ohebs = 'B' then best_price (and associated event_time) for the
>particular order is:
>
> First.time LE max(lo_price, B1pr) LE last.time.
>
>Else if ohebs = 'S' then best_price (and associated event_time) for
>the particular order is:
>
> First.time LE min(lo_price, S1pr) LE last.time.
>
>Thanks for the assistance.
>
>Take care.
>
>Dorian
Later Dorian clarified, and among other things indicated that he did not
really mean "first.time" and "last.time".
On Thu, 28 Dec 2006 05:03:35 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:
>Thanks for your interest and your comment is valid.
>
>I thought the logical expression will be suffice to explain the final
>result I seek but it is confusing and need further clarification.
>
>Each order in the dataset has an entry and exit time from the sample
>and thus, the the used of "first.time" and "last.time" in the logical
>expression. What is required is a comparison of its price to the best
>available prices that existed during this interval (betwen the entry
>and exit times) and then record the highest price value and the
>associated time of the event.
>
>For instance, order DGAE709 is of type 'B' and has an entry and exit
>time of 08:07:48 and 08:08:00, respectively. We need to compare its
>price ("lo_price") 122.50 with prices "B1pr" during its time spent in
>the sample that is, between 08:07:48 and 08:08:00 and choose the
>highest price value from this comparison and record the associated
>event time ("event_time") the highest price value occurred. In this
>case, this occurred at 08:07:52 when b1pr was 123.50.
>
>For order of teh type 'S', we choose the lowest price value by
>comparing the price ("lo_price") of the order to "S1pr".
>
>I hope this clear up things a bit.
>
>Once again thanks for your interest shown.
>
>Take care.
>
>Dorian
Another clarification from the original poster:
On Thu, 28 Dec 2006 10:07:51 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:
>Thanks for the assistance but I don't think your code will generate the
>desired results.
>
>The solution requires that the price of an order is to be compared with
>all prices from its time of entry to the time of exit from the sample.
>
>Let's take for example; DGAE709. The order entry is 08:07:48 and exit
>on 08:08:00. Its price is 122.50 and its type is 'B'.
>
>What is required then is that its price 122.50 is to be compared with
>prices in the column B1pr between 08:07:48 and 08:08:00, returning the
>highest price value.
>
>For completeness:
>
> 08:07:48 Max( 122.50, 122.50) DGAE709's entry time
> 08:07:50 Max( 122.50, 122.50)
> 08:07:52 Max( 122.50, 123.50)
> 08:07:55 Max( 123.50, 121.50)
> 08:08:00 Max( 123.50, 122.75) DGAE709's exit time
>
>output dataset:
>
>Ohecode Time Best_price Event_time
>
>DGAE709 08:08:00 123.50 08:07:52
>
>
>For order of the type 'S', the solution, using an example, is as
>follows:
>
>Comparing the order's price with prices in the column S1pr:
>
>08:07:52 Min(122.75, 123.75) SGTIA05T's time of entry
>08:07:55 Min(122.75, 121.50)
>08:08:00 Min(121.50, 123.00)
>08:08:10 Min(121.50, 122.00) SGTIA05T's time of exit
>
>output
>
>Ohecode Time Best_price Event_time
>
>DGAE709 08:08:00 123.50 08:07:52
>SGTIA05T 08:08:10 121.50 08:07:55
>
>Thanks once again for your assistance.
>
>Dorian
My suggestion:
On Thu, 28 Dec 2006 22:24:38 -0500, Howard Schreier <hs AT dc-sug DOT org>
<nospam@HOWLES.COM>
wrote:
>Maybe a reflexive join ...
>
>data try;
>input
>Ohecode : $10. Time : time8. ohebs:$1. lo_price:8. B1pr:8. S1pr:8.;
>format time time8.;
>cards;
>DGAE709 08:07:48 B 122.50 122.50 123.00
>TANA05T 08:07:50 B 121.50 122.50 123.00
>SGTIA05T 08:07:52 S 122.75 123.50 123.75
>GJCXEUT 08:07:55 B 122.00 121.00 121.50
>DGAE709 08:08:00 B 122.50 122.75 123.00
>SGTIA05T 08:08:10 S 122.75 121.75 122.00
>GUOLS5T 08:08:15 S 123.00 122.75 123.00
>TANA05T 08:09:50 B 121.50 122.75 123.00
>;
>
> proc sql;
> select Ohecode
> , ohebs
> , lo_price
> , entry
> , exit
> , Event_time
> , case when ohebs='B' then b1pr
> when ohebs='S' then s1pr
> else .
> end as Best_price
> from
> (select time as Event_time, b1pr, s1pr from try)
> natural join
> (select ohecode
> , ohebs
> , lo_price
> , min(time) as entry format=time8.
> , max(time) as exit format=time8.
> from try
> group by ohecode, ohebs, lo_price
> )
> where entry le Event_time le exit
> group by ohecode, ohebs, lo_price
> having case when ohebs='B' then b1pr=max(b1pr)
> when ohebs='S' then s1pr=min(s1pr)
> else .
> end
> ;
>
>Result is
>
> Ohecode ohebs lo_price entry exit Event_time Best_price
> -----------------------------------------------------------------------
> DGAE709 B 122.5 8:07:48 8:08:00 8:07:52 123.5
> GJCXEUT B 122 8:07:55 8:07:55 8:07:55 121
> GUOLS5T S 123 8:08:15 8:08:15 8:08:15 123
> SGTIA05T S 122.75 8:07:52 8:08:10 8:07:55 121.5
> TANA05T B 121.5 8:07:50 8:09:50 8:07:52 123.5
Other people posted suggestions (SQL-based and otherwise). However, I think
only my code was confirmed by Dorian as producing the expected results.
On Fri, 29 Dec 2006 08:43:59 -0800, Dorian <d.noel@ICMACENTRE.RDG.AC.UK> wrote:
>Howard,
>
>Your code gives the correct solution to my problem and thus, you have a
>fair understanding of the problem at hand.
>
>However, your solution may takes quite some time to run for a large
>sample. I'm looking for a reasonably fast code beacuse of the size of
>the dataset I'm working with (400K observations).
>
>Is there a faster way to getting the output?
>
>Thanks for your assistance.
>
>Dorian
|