```Date: Tue, 10 Aug 2004 13:41:57 -0400 Reply-To: Howard Schreier Sender: "SAS(r) Discussion" From: Howard Schreier Subject: Re: Comparison across rows Here's how I would do it. Start with expanded test data: data sample; input id name \$ price purch_dt; format purch_dt mmddyy10.; datalines; 1 Cup 10 16000 1 Cup 10 15980 1 Cup 10 16000 1 Pencil 2 16002 2 Pencil 2 16002 2 Pencil 2 15990 2 Pencil 2 15000 2 Pencil 3 16000 2 Pencil 1 16005 3 Book 6 15999 3 Book 4 16000 3 Book 5 16000 3 Book 4 15900 4 Widget 99 17000 4 Widget 99 17020 4 Widget 99 17040 4 Widget 99 17101 5 Widget 99 17000 5 Widget 99 17040 5 Widget 99 17100 ; I added records to test the boundary between 60 and 61 days, and also to check results for sequences of three dates when the first and last are more/less than 60 days apart. Here's an incidental step just to display the intervals: data _null_; set sample; by id name price notsorted; interval = purch_dt - lag(purch_dt); if first.price then interval = .; put (_all_)(=); run; The log shows id=1 name=Cup price=10 purch_dt=10/22/2003 interval=. id=1 name=Cup price=10 purch_dt=10/02/2003 interval=-20 id=1 name=Cup price=10 purch_dt=10/22/2003 interval=20 id=1 name=Pencil price=2 purch_dt=10/24/2003 interval=. id=2 name=Pencil price=2 purch_dt=10/24/2003 interval=. id=2 name=Pencil price=2 purch_dt=10/12/2003 interval=-12 id=2 name=Pencil price=2 purch_dt=01/25/2001 interval=-990 id=2 name=Pencil price=3 purch_dt=10/22/2003 interval=. id=2 name=Pencil price=1 purch_dt=10/27/2003 interval=. id=3 name=Book price=6 purch_dt=10/21/2003 interval=. id=3 name=Book price=4 purch_dt=10/22/2003 interval=. id=3 name=Book price=5 purch_dt=10/22/2003 interval=. id=3 name=Book price=4 purch_dt=07/14/2003 interval=. id=4 name=Widget price=99 purch_dt=07/18/2006 interval=. id=4 name=Widget price=99 purch_dt=08/07/2006 interval=20 id=4 name=Widget price=99 purch_dt=08/27/2006 interval=20 id=4 name=Widget price=99 purch_dt=10/27/2006 interval=61 id=5 name=Widget price=99 purch_dt=07/18/2006 interval=. id=5 name=Widget price=99 purch_dt=08/27/2006 interval=40 id=5 name=Widget price=99 purch_dt=10/26/2006 interval=60 The solution is in steps. Some could be rolled together, but I think it's easier to follow this way. proc sql; First, collapse duplicate records and retain the information in a count column: create view one as select sample.*, count(*) as many from sample group by id, name, price, purch_dt; Next, do a reflexive join to find all row pairs with matching keys and no more than 60 days of separation: create view two as select ref.purch_dt as refdate, one.* from one as ref inner join one on ref.id = one.id and ref.name = one.name and ref.price = one.price where one.purch_dt>=ref.purch_dt and one.purch_dt<=ref.purch_dt+60; Then subset and filter out the rows which do not meet the criteria of being part of a multiple: create view three as select * from two group by id, name, price, refdate having sum(many)>1; Finally, get rid of duplicates: create table outcome as select distinct id, name, price, purch_dt, many from three order by id, purch_dt; quit; Results: Obs id name price purch_dt many 1 1 Cup 10 10/02/2003 1 2 1 Cup 10 10/22/2003 2 3 2 Pencil 2 10/12/2003 1 4 2 Pencil 2 10/24/2003 1 5 4 Widget 99 07/18/2006 1 6 4 Widget 99 08/07/2006 1 7 4 Widget 99 08/27/2006 1 8 5 Widget 99 07/18/2006 1 9 5 Widget 99 08/27/2006 1 10 5 Widget 99 10/26/2006 1 On Fri, 6 Aug 2004 18:04:38 -0400, LL Cool Mars wrote: >Hi SAS maniacs, > >Here is the light problem: > >data sample; >input id name \$ price purch_dt; >format purch_dt mmddyy10.; >datalines; >1 Cup 10 16000 >1 Cup 10 15980 >1 Cup 10 16000 >1 Pencil 2 16002 >2 Pencil 2 16002 >2 Pencil 2 15990 >2 Pencil 2 15000 >2 Pencil 3 16000 >2 Pencil 1 16005 >3 Book 6 15999 >3 Book 4 16000 >3 Book 5 16000 >3 Book 4 15900 >; >run; > >proc sort data = sample; >by id name price purch_dt; >quit; > >would like to get list of items where id, name, and price are equal AND >pruch_dt within 60 days. I am looking for the solution in one step with >outcome looking like this: >1 Cup 10 16000 >1 Cup 10 15980 >1 Cup 10 16000 >2 Pencil 2 16002 >2 Pencil 2 15990 > > >if anybody is interested to take a look at what I came up with here it is, >but I believe this is clumsy: > >data outcome; >set sample; > if id = LAG(id) and > name = LAG(name) and > price = LAG(price) and > ABS(INTCK('day',purch_dt,LAG(purch_dt)))<60 then output; >run; > >proc sort data = outcome; >by id name price purch_dt; >quit; > > >data merged; >merge outcome (in=in1) sample(in=in2 RENAME =(purch_dt=orig_purch_dt)); >by id name price; >if in1 and in2 and ABS(INTCK('day',purch_dt, orig_purch_dt))<60 then output >merged; >run; > >Thanks much, >LL Cool Mars ```

Back to: Top of message | Previous page | Main SAS-L page