Date: Fri, 2 May 2008 19:49:45 -0700
Reply-To: MBWD <xmbwdx@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: MBWD <xmbwdx@GMAIL.COM>
Organization: http://groups.google.com
Subject: Re: Deduplicate tricky "Cancel" data
Content-Type: text/plain; charset=ISO-8859-1
Thanks so much for your help. Unfortunately, the data example I gave
does not fully illustrate the problem. Your solution works when
there is a complete match between the date of two transactions: the
cancel and its counterpart. But after using your solution it turns
out that there are sometimes MANY iterations of pairs -- not just
two. Moreover, sometimes the data wash out after the cancels and
counterpart are paired, and sometimes not. As an example, I modified
the data below to give the idea. Finally, the order of the cancel and
its counterpart are often off (e.g., IBM):
ABC1234 Fred S -175 XOM $89.00 12/1/2007 ($15,575.00) X
ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00)
ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00
ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00)
ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00
ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00)
ABC1234 Fred S X 175 XOM $89.00 1/8/2008 $15,575.00
ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00) X
ABC1234 Fred B 105 ATT $25.00 12/1/2007 $2,625.00 X
ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00
ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00
ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00
ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00
ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00
ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00
ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00
ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00
As you can see, there are sometimes 2 complete pairs of cancels/
counterparts -- e.g., 1.5.08 XOM trades.
There are also times where a pair washes out (cancel/counterpart), and
leaves one sale -- e.g., 1.8.08 XOM trades.
I put an X next to the transactions in the new data set that should
remain after the wash. But I have no idea how to get this result.
Thank you again for all help. - M.
On May 1, 6:18 pm, ya.hu...@AMYLIN.COM (Ya Huang) wrote:
> Assumption is that canceled and the counterpart are on the same
> date for same account, name and stock:
>
> data have;
> length net $15;
> input Account $ Name $ B $ Cancel $19 Quantity Stock $ Price $
> eDate :mmddyy10. Net $;
> cards;
> ABC1234 Fred S -175 XOM $89.00 12/1/2007 ($15,575.00)
> ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00)
> ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00
> ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00)
> ABC1234 Fred S X 175 XOM $89.00 1/8/2008 $15,575.00
> ABC1234 Fred B 105 ATT $25.00 12/1/2007 $2,625.00
> ;
>
> proc sql;
> create table need as
> select *,case when count(edate)=2 and
> max(quantity)=-min(quantity)
> then 1 else 0 end as del
> from have
> group by account,name,stock,edate
> ;
>
> proc print;
> format edate date.;
> run;
>
> Account Cancel Quantity Stock eDate net del
>
> ABC1234 105 ATT 01DEC07 $2,625.00 0
> ABC1234 -175 XOM 01DEC07 ($15,575.00) 0
> ABC1234 X 175 XOM 05JAN08 $15,575.00 1
> ABC1234 -175 XOM 05JAN08 ($15,575.00) 1
> ABC1234 X 175 XOM 08JAN08 $15,575.00 1
> ABC1234 -175 XOM 08JAN08 ($15,575.00) 1
>
>
>
> On Thu, 1 May 2008 15:09:45 -0700, MBWD <xmb...@GMAIL.COM> wrote:
> >I can't figure out how to get the below data to dedupe (sort of) by
> >deleting the two canceled observations AND their two counterparts (or
> >alternatively, just creating a binary variably 1,0 for those
> >observations that match the cancel row).
>
> >This would delete all but rows 1 and 6. Rows 2 and 3 are counterparts
> >in a cancel transaction. So are rows 3 and 4.
>
> >Account Name B Cancel Quantity Stock Price Entry Date Net
> >ABC1234 Fred S -175 XOM $89.00 12/1/2007
> ($15,575.00)
> >ABC1234 Fred S -175 XOM $89.00 1/5/2008
> ($15,575.00)
> >ABC1234 Fred S X 175 XOM $89.00 1/5/2008
> $15,575.00
> >ABC1234 Fred S -175 XOM $89.00 1/8/2008
> ($15,575.00)
> >ABC1234 Fred S X 175 XOM $89.00 1/8/2008
> $15,575.00
> >ABC1234 Fred B 105 ATT $25.00 12/1/2007
> $2,625.00
>
> >In addition, it would be very helpful if someone know how to get the
> >PROC SORT DUPKEY method to NOT delete any observations, but rather
> >just mark as "1" and "0" those observations that it would have
> >deleted.
>
> >Thanks for any and all help!!
On May 1, 6:18 pm, ya.hu...@AMYLIN.COM (Ya Huang) wrote:
> Assumption is that canceled and the counterpart are on the same
> date for same account, name and stock:
>
> data have;
> length net $15;
> input Account $ Name $ B $ Cancel $19 Quantity Stock $ Price $
> eDate :mmddyy10. Net $;
> cards;
> ABC1234 Fred S -175 XOM $89.00 12/1/2007 ($15,575.00)
> ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00)
> ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00
> ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00)
> ABC1234 Fred S X 175 XOM $89.00 1/8/2008 $15,575.00
> ABC1234 Fred B 105 ATT $25.00 12/1/2007 $2,625.00
> ;
>
> proc sql;
> create table need as
> select *,case when count(edate)=2 and
> max(quantity)=-min(quantity)
> then 1 else 0 end as del
> from have
> group by account,name,stock,edate
> ;
>
> proc print;
> format edate date.;
> run;
>
> Account Cancel Quantity Stock eDate net del
>
> ABC1234 105 ATT 01DEC07 $2,625.00 0
> ABC1234 -175 XOM 01DEC07 ($15,575.00) 0
> ABC1234 X 175 XOM 05JAN08 $15,575.00 1
> ABC1234 -175 XOM 05JAN08 ($15,575.00) 1
> ABC1234 X 175 XOM 08JAN08 $15,575.00 1
> ABC1234 -175 XOM 08JAN08 ($15,575.00) 1
>
>
>
> On Thu, 1 May 2008 15:09:45 -0700, MBWD <xmb...@GMAIL.COM> wrote:
> >I can't figure out how to get the below data to dedupe (sort of) by
> >deleting the two canceled observations AND their two counterparts (or
> >alternatively, just creating a binary variably 1,0 for those
> >observations that match the cancel row).
>
> >This would delete all but rows 1 and 6. Rows 2 and 3 are counterparts
> >in a cancel transaction. So are rows 3 and 4.
>
> >Account Name B Cancel Quantity Stock Price Entry Date Net
> >ABC1234 Fred S -175 XOM $89.00 12/1/2007
> ($15,575.00)
> >ABC1234 Fred S -175 XOM $89.00 1/5/2008
> ($15,575.00)
> >ABC1234 Fred S X 175 XOM $89.00 1/5/2008
> $15,575.00
> >ABC1234 Fred S -175 XOM $89.00 1/8/2008
> ($15,575.00)
> >ABC1234 Fred S X 175 XOM $89.00 1/8/2008
> $15,575.00
> >ABC1234 Fred B 105 ATT $25.00 12/1/2007
> $2,625.00
>
> >In addition, it would be very helpful if someone know how to get the
> >PROC SORT DUPKEY method to NOT delete any observations, but rather
> >just mark as "1" and "0" those observations that it would have
> >deleted.
>
> >Thanks for any and all help!!
|