LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (May 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 5 May 2008 09:04:54 -0700
Reply-To:     MBWD <xmbwdx@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         MBWD <xmbwdx@GMAIL.COM>
Organization: http://groups.google.com
Subject:      Re: Deduplicate tricky "Cancel" data
Comments: To: sas-l@uga.edu
Content-Type: text/plain; charset=ISO-8859-1

Hi Mark. You are correct -- lines 10-13 were entered incorrectly and should have been ($21,250.00). Sorry about that. I just got back, and I am trying your solution. Thanks. - M.

On May 3, 6:39 pm, mkei...@WHARTON.UPENN.EDU ("Keintz, H. Mark") wrote: > MBWD, I think you didn't type in all the data values as you intend. > Shouldn't lines 10 through 13 end up with ($21,250.00), instead of the > $21,250.00 you have? > > If so this program, using a hash table, won't be effected by the order > of cancels and their corresponding regular transactions, and it will > take only one pass through the data. > > It works by using the hash table to track each transaction profile > (using all vars except CANCEL), adds 1 to the net count ( _NTRNS) for > regular trans and subtracts 1 for cancels. > > At the end of reading in one person's transaction, the net count > controls which profiles, and how many, are output. > > The only requirement is that all the records for a given person be > together in your original data set. (Or you could do first.account and > last.account instead, in which case records for each account should be > together). > > Regards, > Mark > > data want (drop=_:); > set have; > by acct name ; > > if _n_=1 then do; > declare hash trns; > declare hiter hit; > end; > > ** For each person, set up new instance of a hash table. ** > ** Note it has _NTRNS, but not CANCEL **; > if first.name then do; > trns = _new_ hash(hashexp:6,ordered:'a'); > trns.definekey ('acct','name','edate','stock','b' > ,'qty','prc','trans'); > trns.definedata('acct','name','b','qty','stock' > ,'prc','edate','trans','_ntrns'); > trns.definedone(); > hit= _new_ hiter('trns'); > end; > > ** For CANCEL='X', make QTY & TRANS for matching profile **; > select (cancel); > when (' ') _plus_minus=+1; > when ('X') do; > _plus_minus=-1; > qty=-1*qty; * Match regular transaction profile; > trans=-1*trans; * Match regular transaction profile; > end; > end; > > _rc=trns.check(); ** See if profile already exists ... **; > if _rc=0 then do; ** If yes ... **; > _rc=trns.find(); > _ntrns=_ntrns + _plus_minus; > end; > else _ntrns=_plus_minus; > > _rc=trns.replace(); ** Back into hash, + updated _NTRNS **; > > ** At the end of the person, write each profile _NTRNS times **; > if last.name then do; > _rc=hit.first(); > do while (_rc=0); > if _ntrns ^=0 then do _i=1 to _ntrns; output want; end; > _rc=hit.next(); > end; > _rc=trns.delete(); > _rc=hit.delete(); > end; > run; > > > > -----Original Message----- > From: SAS(r) Discussion [mailto:SA...@LISTSERV.UGA.EDU] On Behalf Of > > MBWD > Sent: Friday, May 02, 2008 10:50 PM > To: SA...@LISTSERV.UGA.EDU > Subject: Re: Deduplicate tricky "Cancel" data > > Thanks so much for your help. Unfortunately, the data example I gave > does not fully illustrate the problem. Your solution works when > there is a complete match between the date of two transactions: the > cancel and its counterpart. But after using your solution it turns > out that there are sometimes MANY iterations of pairs -- not just > two. Moreover, sometimes the data wash out after the cancels and > counterpart are paired, and sometimes not. As an example, I modified > the data below to give the idea. Finally, the order of the cancel and > its counterpart are often off (e.g., IBM): > > ABC1234 Fred S -175 XOM $89.00 12/1/2007 ($15,575.00) X > ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00) > ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00 > ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00) > ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00 > ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00) > ABC1234 Fred S X 175 XOM $89.00 1/8/2008 $15,575.00 > ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00) X > ABC1234 Fred B 105 ATT $25.00 12/1/2007 $2,625.00 X > ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00 > ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00 > ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00 > ABC1234 Fred B X -850 IBM $25.00 3/1/2007 $21,250.00 > ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00 > ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00 > ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00 > ABC1234 Fred B 850 IBM $25.00 3/1/2007 $21,250.00 > > As you can see, there are sometimes 2 complete pairs of cancels/ > counterparts -- e.g., 1.5.08 XOM trades. > > There are also times where a pair washes out (cancel/counterpart), and > leaves one sale -- e.g., 1.8.08 XOM trades. > > I put an X next to the transactions in the new data set that should > remain after the wash. But I have no idea how to get this result. > > Thank you again for all help. - M. > > On May 1, 6:18 pm, ya.hu...@AMYLIN.COM (Ya Huang) wrote: > > Assumption is that canceled and the counterpart are on the same > > date for same account, name and stock: > > > data have; > > length net $15; > > input Account $ Name $ B $ Cancel $19 Quantity Stock $ Price $ > > eDate :mmddyy10. Net $; > > cards; > > ABC1234 Fred S -175 XOM $89.00 12/1/2007 ($15,575.00) > > ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00) > > ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00 > > ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00) > > ABC1234 Fred S X 175 XOM $89.00 1/8/2008 $15,575.00 > > ABC1234 Fred B 105 ATT $25.00 12/1/2007 $2,625.00 > > ; > > > proc sql; > > create table need as > > select *,case when count(edate)=2 and > > max(quantity)=-min(quantity) > > then 1 else 0 end as del > > from have > > group by account,name,stock,edate > > ; > > > proc print; > > format edate date.; > > run; > > > Account Cancel Quantity Stock eDate net > del > > > ABC1234 105 ATT 01DEC07 $2,625.00 > 0 > > ABC1234 -175 XOM 01DEC07 ($15,575.00) > 0 > > ABC1234 X 175 XOM 05JAN08 $15,575.00 > 1 > > ABC1234 -175 XOM 05JAN08 ($15,575.00) > 1 > > ABC1234 X 175 XOM 08JAN08 $15,575.00 > 1 > > ABC1234 -175 XOM 08JAN08 ($15,575.00) > 1 > > > On Thu, 1 May 2008 15:09:45 -0700, MBWD <xmb...@GMAIL.COM> wrote: > > >I can't figure out how to get the below data to dedupe (sort of) by > > >deleting the two canceled observations AND their two counterparts (or > > >alternatively, just creating a binary variably 1,0 for those > > >observations that match the cancel row). > > > >This would delete all but rows 1 and 6. Rows 2 and 3 are > counterparts > > >in a cancel transaction. So are rows 3 and 4. > > > >Account Name B Cancel Quantity Stock Price Entry Date Net > > >ABC1234 Fred S -175 XOM $89.00 12/1/2007 > > ($15,575.00) > > >ABC1234 Fred S -175 XOM $89.00 1/5/2008 > > ($15,575.00) > > >ABC1234 Fred S X 175 XOM $89.00 1/5/2008 > > $15,575.00 > > >ABC1234 Fred S -175 XOM $89.00 1/8/2008 > > ($15,575.00) > > >ABC1234 Fred S X 175 XOM $89.00 1/8/2008 > > $15,575.00 > > >ABC1234 Fred B 105 ATT $25.00 12/1/2007 > > $2,625.00 > > > >In addition, it would be very helpful if someone know how to get the > > >PROC SORT DUPKEY method to NOT delete any observations, but rather > > >just mark as "1" and "0" those observations that it would have > > >deleted. > > > >Thanks for any and all help!! > > On May 1, 6:18 pm, ya.hu...@AMYLIN.COM (Ya Huang) wrote: > > Assumption is that canceled and the counterpart are on the same > > date for same account, name and stock: > > > data have; > > length net $15; > > input Account $ Name $ B $ Cancel $19 Quantity Stock $ Price $ > > eDate :mmddyy10. Net $; > > cards; > > ABC1234 Fred S -175 XOM $89.00 12/1/2007 ($15,575.00) > > ABC1234 Fred S -175 XOM $89.00 1/5/2008 ($15,575.00) > > ABC1234 Fred S X 175 XOM $89.00 1/5/2008 $15,575.00 > > ABC1234 Fred S -175 XOM $89.00 1/8/2008 ($15,575.00) > > ABC1234 Fred S X 175 XOM $89.00 1/8/2008 $15,575.00 > > ABC1234 Fred B 105 ATT $25.00 12/1/2007 $2,625.00 > > ; > > > proc sql; > > create table need as > > select *,case when count(edate)=2 and > > max(quantity)=-min(quantity) > > then 1 else 0 end as del > > from have > > group by account,name,stock,edate > > ; > > > proc print; > > format edate date.; > > run; > > > Account Cancel Quantity Stock eDate net > del > > > ABC1234 105 ATT 01DEC07 $2,625.00 > 0 > > ABC1234 -175 XOM 01DEC07 ($15,575.00) > 0 > > ABC1234 X 175 XOM 05JAN08 $15,575.00 > 1 > > ABC1234 -175 XOM 05JAN08 ($15,575.00) > 1 > > ABC1234 X 175 XOM 08JAN08 $15,575.00 > 1 > > ABC1234 -175 XOM 08JAN08 ($15,575.00) > 1 > > > On Thu, 1 May 2008 15:09:45 -0700, MBWD <xmb...@GMAIL.COM> wrote: > > >I can't figure out how to get the below data to dedupe (sort of) by > > >deleting the two canceled observations AND their two counterparts (or > > >alternatively, just creating a binary variably 1,0 for those > > >observations that match the cancel row). > > > >This would delete all but rows 1 and 6. Rows 2 and 3 are > counterparts > > >in a cancel transaction. So are rows 3 and 4. > > > >Account Name B Cancel Quantity Stock Price Entry Date Net > > >ABC1234 Fred S -175 XOM $89.00 12/1/2007 > > ($15,575.00) > > >ABC1234 Fred S -175 XOM $89.00 1/5/2008 > > ($15,575.00) > > >ABC1234 Fred S X 175 XOM $89.00 1/5/2008 > > $15,575.00 > > >ABC1234 Fred S -175 XOM $89.00 1/8/2008 > > ($15,575.00) > > >ABC1234 Fred S X 175 XOM $89.00 1/8/2008 > > $15,575.00 > > >ABC1234 Fred B 105 ATT $25.00 12/1/2007 > > $2,625.00 > > > >In addition, it would be very helpful if someone know how to get the > > >PROC SORT DUPKEY method to NOT delete any observations, but rather > > >just mark as "1" and "0" those observations that it would have > > >deleted. > > > >Thanks for any and all help!!- Hide quoted text - > > - Show quoted text -


Back to: Top of message | Previous page | Main SAS-L page