Date: Thu, 1 May 2008 21:18:08 -0400 Ya Huang "SAS(r) Discussion" Ya Huang Re: Deduplicate tricky "Cancel" data To: xmbwdx@GMAIL.COM

Assumption is that canceled and the counterpart are on the same date for same account, name and stock:

data have; length net \$15; input Account \$ Name \$ B \$ Cancel \$19 Quantity Stock \$ Price \$ eDate :mmddyy10. Net \$; cards; ABC1234 Fred S -175 XOM \$89.00 12/1/2007 (\$15,575.00) ABC1234 Fred S -175 XOM \$89.00 1/5/2008 (\$15,575.00) ABC1234 Fred S X 175 XOM \$89.00 1/5/2008 \$15,575.00 ABC1234 Fred S -175 XOM \$89.00 1/8/2008 (\$15,575.00) ABC1234 Fred S X 175 XOM \$89.00 1/8/2008 \$15,575.00 ABC1234 Fred B 105 ATT \$25.00 12/1/2007 \$2,625.00 ;

proc sql; create table need as select *,case when count(edate)=2 and max(quantity)=-min(quantity) then 1 else 0 end as del from have group by account,name,stock,edate ;

proc print; format edate date.; run;

Account Cancel Quantity Stock eDate net del

ABC1234 105 ATT 01DEC07 \$2,625.00 0 ABC1234 -175 XOM 01DEC07 (\$15,575.00) 0 ABC1234 X 175 XOM 05JAN08 \$15,575.00 1 ABC1234 -175 XOM 05JAN08 (\$15,575.00) 1 ABC1234 X 175 XOM 08JAN08 \$15,575.00 1 ABC1234 -175 XOM 08JAN08 (\$15,575.00) 1

On Thu, 1 May 2008 15:09:45 -0700, MBWD <xmbwdx@GMAIL.COM> wrote:

>I can't figure out how to get the below data to dedupe (sort of) by >deleting the two canceled observations AND their two counterparts (or >alternatively, just creating a binary variably 1,0 for those >observations that match the cancel row). > >This would delete all but rows 1 and 6. Rows 2 and 3 are counterparts >in a cancel transaction. So are rows 3 and 4. > >Account Name B Cancel Quantity Stock Price Entry Date Net >ABC1234 Fred S -175 XOM \$89.00 12/1/2007 (\$15,575.00) >ABC1234 Fred S -175 XOM \$89.00 1/5/2008 (\$15,575.00) >ABC1234 Fred S X 175 XOM \$89.00 1/5/2008 \$15,575.00 >ABC1234 Fred S -175 XOM \$89.00 1/8/2008 (\$15,575.00) >ABC1234 Fred S X 175 XOM \$89.00 1/8/2008 \$15,575.00 >ABC1234 Fred B 105 ATT \$25.00 12/1/2007 \$2,625.00 > > >In addition, it would be very helpful if someone know how to get the >PROC SORT DUPKEY method to NOT delete any observations, but rather >just mark as "1" and "0" those observations that it would have >deleted. > >Thanks for any and all help!!

Back to: Top of message | Previous page | Main SAS-L page