Date: Mon, 18 Sep 2006 10:52:03 -0400
Reply-To: wing wah <wing.tham03@PHD.WBS.AC.UK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: wing wah <wing.tham03@PHD.WBS.AC.UK>
Subject: hash
Dear folks,
I am trying to reconstruct the real time orderbook in financial market. If
the quotes below are bid price that is posted by traders, i am interested
to rank the posted quote as the bestbid secondbid thirdbid fourthbid and so
on. The quantity of each rank (bestbid, secondbid,....) is sum of
accumulated quantity of those rank before it. i.e. quantity of
bestbid=quantity of bestbid, accumulated quantity of secondbid=quantity of
secondbid+quantity of bestbid, and so on... I hope the output of the code
will provide a clearer explanation of the logic behind what i am trying to
construct.
Since i am trying to create a demand curve. any price lower than the
bestbid price will have an accumulated quantity (look at the demand and
supply curve). By doing so, I will have to think of a way to extract the
price and arrange them as the bestbid, secondbid..... The best bid from the
output from the simple code below will be the first price that is not '.'
starting from right to left. The secondbid will be the next accumated
quantity change after then bestbid from right to left. At least by doing
this, i am not worried about how far the queue will stretch.
The datasize is about 10G with about 20 million observations.
Firstly, I am wondering if this can be done more efficiently using 'hash'.
Secondly, if the code below is a sensible one, from the output below how
can i extract the bestbid, secondbid,... and the quantity? I am using proc
transpose for testing but i am not sure about using it when the data size
increases. All suggestions and criticism are welcome.
Thank you in advance!
Wing
data given;
input quote quantity ;
cards;
0.6126 3
0.6126 -3
0.611 1
0.611 -1
0.6115 1
0.6115 -1
0.6103 1
0.6103 -1
0.6075 5
0.6075 -5
0.6061 19
0.609 2
0.6118 2
0.6075 5
0.6061 -19
0.6084 19
0.6118 -2
0.6115 1
0.6121 2
0.6121 -2
0.612 1
0.6118 2
0.6121 1
0.6121 -1
0.612 -1
0.612 5
0.6054 2
0.612 1
0.612 -1
0.61205 1
0.6116 5
0.61105 1
0.61105 -1
0.61205 -1
0.6123 1
0.6123 -1
0.612 2
0.612 2
0.612 -2
0.6116 -5
0.6115 -1
0.612 1
;
DATA maxmin;
SET given END=lastobs ;
IF _N_ = 1 THEN DO;
start = quote;
finish = quote ;
END;
RETAIN start finish;
start = (MIN(start,quote));
finish = (MAX(finish,quote)) ;
IF lastobs THEN DO;
start=start*100000-1;
finish=finish*100000+1;
diffmaxminprice=finish-start+1;
CALL SYMPUT('lo',LEFT(PUT(start,8.))) ;
CALL SYMPUT('hi',LEFT(PUT(finish,8.)));
CALL SYMPUT('diffmaxminprice',LEFT(PUT(diffmaxminprice,8.)));
END;
RUN;
%put lo &lo.;
%put hi &hi;
%put diffmaxminprice &diffmaxminprice;
data roll3(drop = i newpoint);
set given (obs=1000);
array v(&LO:&HI) v&LO-v&HI;
retain v&LO-v&HI id(0) ;
if _n_=1 then id=0;
id=id+1;
bid=int(quote*100000);
newpoint = missing(v(bid) );
do i = bid to &lo+1 by -1;
if i=bid or not missing(v(i) ) then v(i) = sum(v(i),quantity);
if v(i)<=0 then v(i) = .;
end;
do i = bid+1 to &Hi-1 until(v(i) or i=&hi); end;
if v(bid)=v(i) then v(bid)=.;
if newpoint then do;
do i = bid+1 to &Hi-1 until(v(i) or i=&hi); end;
v(bid) = sum(v(bid),v(i));
end;
run;
proc transpose data=roll3 out=dvector;
by id ;
var v:;
run;quit;
data dvector;
set dvector;
if col1=. or col1=0 then delete;
run;