|
I think it can be simpler.
The need for the KEEP= can be eliminated by specifying only the wanted
columns in the SELECT list.
Then, since DATE is the grouping level and CROSS is a summary stat, SQL
will automatically produce just one row per group. So the HAVING is not
needed.
I assume just one row per stock per day. In that case there is no need for
the intervening DATA step which generates sequence numbers.
That leaves
create table cross_product as
select a.date, sum(a.return*b.return) as cross
from xx a, xx b
where a.date=b.date and a.stock < b.stock
group by a.date
;
As to performance, Toby has a good point. I generated a 1Kx1K test table:
data xx;
do stock = 1 to 1e3;
do date = 1 to 1e3;
return = ranuni(1);
output;
end;
end;
run;
The SQL solution took 11+ minutes on my system.
As an alternative, I sorted by DATE (that took 3 seconds), then ran this
DATA step:
data cross;
array a_return(9999) _temporary_;
cross = 0;
do stocknum = 1 by 1 until (last.date);
set xx;
by date;
a_return(stocknum) = return;
if stocknum>1 then do j = 1 to (stocknum-1);
cross + return * a_return(j);
end;
end;
keep date cross;
run;
It took a bit over 1 minute and produced the same numbers as SQL. So the
reduction was about 90%.
On Tue, 21 Jun 2005 17:39:12 +0000, toby dunn <tobydunn@HOTMAIL.COM> wrote:
>Ya,
>
>Three things,
>
>1.) The cartesian product created by SQL is going to be a bear with a lot
of
>data. But it was the same Idea I had, don't know if there is a less memory
>intensive way to do this.
>
>2.) Need to only keep Date and Cross vars.
>
>3.) Lastly the solution only needs on obs per Date.
>
>proc sql;
>create table cross_product (keep = Date Cross) as
> select a.*,b.return, sum(a.return*b.return) as cross
> from xx a, xx b
> where a.date=b.date and a.n < b.n
> group a.date
> having monotonic() = min(monotonic())
> order by a.date,a.n
>;
>quit ;
>
>
>Should do it.
>
>
>
>
>Toby Dunn
>
>
>
>
>From: Ya Huang <ya.huang@AMYLIN.COM>
>Reply-To: Ya Huang <ya.huang@AMYLIN.COM>
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: Hwo to calculate the sum of cross-product of all observations
> in each group?
>Date: Tue, 21 Jun 2005 13:31:51 -0400
>Received: from malibu.cc.uga.edu ([128.192.1.103]) by mc5-f31.hotmail.com
>with Microsoft SMTPSVC(6.0.3790.211); Tue, 21 Jun 2005 10:31:59 -0700
>Received: from listserv.cc.uga.edu (128.192.1.75) by malibu.cc.uga.edu
>(LSMTP for Windows NT v1.1b) with SMTP id <0.00FD289E@malibu.cc.uga.edu>;
>Tue, 21 Jun 2005 13:31:54 -0400
>Received: from LISTSERV.UGA.EDU by LISTSERV.UGA.EDU (LISTSERV-TCP/IP
release
> 1.8d) with spool id 3725305 for SAS-L@LISTSERV.UGA.EDU; Tue, 21
>Jun 2005 13:31:54 -0400
>Received: from malibu.cc.uga.edu (malibu.cc.uga.edu [128.192.1.103]) by
> listserv.cc.uga.edu (8.12.11/8.12.11) with ESMTP id j5LHVp1l020471
> for <SAS-L@LISTSERV.UGA.EDU>; Tue, 21 Jun 2005 13:31:53 -0400
>Received: from listserv.cc.uga.edu (128.192.1.75) by malibu.cc.uga.edu
>(LSMTP for Windows NT v1.1b) with SMTP id
><0.00FD2895@malibu.cc.uga.edu>; Tue, 21 Jun 2005 13:31:50 -0400
>X-Message-Info: g+YnDTwJTP7NTx9/VqYPG3Y+uq+rEUZyREbMII8peYY=
>Comments: To: shenyiyu@GMAIL.COM
>Return-Path: owner-sas-l@LISTSERV.UGA.EDU
>X-OriginalArrivalTime: 21 Jun 2005 17:31:59.0717 (UTC)
>FILETIME=[210E3950:01C57687]
>
>This one seems what you want:
>
>data xx;
>input Date Stock Return;
>cards;
>199001 10001 1.5
>199001 10002 2.4
>199001 10008 0.8
>199001 10006 1.4
>199001 10005 0.1
>199002 10005 1.5
>199002 10001 0.6
>199002 10014 1.1
>199002 10006 1.2
>199002 10034 3.4
>199002 10004 1.9
>;
>
>data xx;
> set xx;
>by date;
>if first.date then n=0;
>n+1;
>run;
>
>options ps=999;
>proc sql;
>select a.*,b.return, sum(a.return*b.return) as cross
>from xx a, xx b
>where a.date=b.date and a.n < b.n
>group a.date
>order by a.date,a.n
>;
>
> Date Stock Return n Return cross
>----------------------------------------------------------
> 199001 10001 1.5 1 0.8 13.91
> 199001 10001 1.5 1 2.4 13.91
> 199001 10001 1.5 1 1.4 13.91
> 199001 10001 1.5 1 0.1 13.91
> 199001 10002 2.4 2 0.8 13.91
> 199001 10002 2.4 2 0.1 13.91
> 199001 10002 2.4 2 1.4 13.91
> 199001 10008 0.8 3 0.1 13.91
> 199001 10008 0.8 3 1.4 13.91
> 199001 10006 1.4 4 0.1 13.91
> 199002 10005 1.5 1 1.1 36.83
> 199002 10005 1.5 1 1.2 36.83
> 199002 10005 1.5 1 1.9 36.83
> 199002 10005 1.5 1 3.4 36.83
> 199002 10005 1.5 1 0.6 36.83
> 199002 10001 0.6 2 1.1 36.83
> 199002 10001 0.6 2 1.2 36.83
> 199002 10001 0.6 2 1.9 36.83
> 199002 10001 0.6 2 3.4 36.83
> 199002 10014 1.1 3 1.2 36.83
> 199002 10014 1.1 3 3.4 36.83
> 199002 10014 1.1 3 1.9 36.83
> 199002 10006 1.2 4 3.4 36.83
> 199002 10006 1.2 4 1.9 36.83
> 199002 10034 3.4 5 1.9 36.83
>
>Kind regards,
>
>Ya Huang
>
>On Tue, 21 Jun 2005 10:02:24 -0700, Yiyu <shenyiyu@GMAIL.COM> wrote:
>
> >The data is as following: for each date there are some stocks, and
> >their returns, My questions is : for each date, I cross-product all
> >stock returns with each other and sum them together. How to do this
> >without resorting to macros?
> >
> >For example
> >
> >Date Stock Return
> >199001 10001 1.5
> >199001 10002 2.4
> >199001 10008 0.8
> >199001 10006 1.4
> >199001 10005 0.1
> >
> >
> >199002 10005 1.5
> >199002 10001 0.6
> >199002 10014 1.1
> >199002 10006 1.2
> >199002 10034 3.4
> >199002 10004 1.9
> >
> >199003 ...... ....
> >
> >
> >and I want to get something like:
> >
> >Date Sum_Cross_Prod
> >199001 Sum1
> >199002 Sum2
> >
> >where Sum1 =
>
>1.5*2.4+1.5*2.4+1.5*0.8+1.5*1.4+1.5*0.1+2.4*0.8+2.4*1.4+2.4*0.1+0.8*1.4+0.8
>*0.1+1.4*0.1
> >
> >so on.
> >
> >Thank you very much.
> >
> >Yiyu
|