LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 10 Aug 2004 13:41:57 -0400
Reply-To:     Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject:      Re: Comparison across rows

Here's how I would do it. Start with expanded test data:

data sample; input id name $ price purch_dt; format purch_dt mmddyy10.; datalines; 1 Cup 10 16000 1 Cup 10 15980 1 Cup 10 16000 1 Pencil 2 16002 2 Pencil 2 16002 2 Pencil 2 15990 2 Pencil 2 15000 2 Pencil 3 16000 2 Pencil 1 16005 3 Book 6 15999 3 Book 4 16000 3 Book 5 16000 3 Book 4 15900 4 Widget 99 17000 4 Widget 99 17020 4 Widget 99 17040 4 Widget 99 17101 5 Widget 99 17000 5 Widget 99 17040 5 Widget 99 17100 ;

I added records to test the boundary between 60 and 61 days, and also to check results for sequences of three dates when the first and last are more/less than 60 days apart.

Here's an incidental step just to display the intervals:

data _null_; set sample; by id name price notsorted; interval = purch_dt - lag(purch_dt); if first.price then interval = .; put (_all_)(=); run;

The log shows

id=1 name=Cup price=10 purch_dt=10/22/2003 interval=. id=1 name=Cup price=10 purch_dt=10/02/2003 interval=-20 id=1 name=Cup price=10 purch_dt=10/22/2003 interval=20 id=1 name=Pencil price=2 purch_dt=10/24/2003 interval=. id=2 name=Pencil price=2 purch_dt=10/24/2003 interval=. id=2 name=Pencil price=2 purch_dt=10/12/2003 interval=-12 id=2 name=Pencil price=2 purch_dt=01/25/2001 interval=-990 id=2 name=Pencil price=3 purch_dt=10/22/2003 interval=. id=2 name=Pencil price=1 purch_dt=10/27/2003 interval=. id=3 name=Book price=6 purch_dt=10/21/2003 interval=. id=3 name=Book price=4 purch_dt=10/22/2003 interval=. id=3 name=Book price=5 purch_dt=10/22/2003 interval=. id=3 name=Book price=4 purch_dt=07/14/2003 interval=. id=4 name=Widget price=99 purch_dt=07/18/2006 interval=. id=4 name=Widget price=99 purch_dt=08/07/2006 interval=20 id=4 name=Widget price=99 purch_dt=08/27/2006 interval=20 id=4 name=Widget price=99 purch_dt=10/27/2006 interval=61 id=5 name=Widget price=99 purch_dt=07/18/2006 interval=. id=5 name=Widget price=99 purch_dt=08/27/2006 interval=40 id=5 name=Widget price=99 purch_dt=10/26/2006 interval=60

The solution is in steps. Some could be rolled together, but I think it's easier to follow this way.

proc sql;

First, collapse duplicate records and retain the information in a count column:

create view one as select sample.*, count(*) as many from sample group by id, name, price, purch_dt;

Next, do a reflexive join to find all row pairs with matching keys and no more than 60 days of separation:

create view two as select ref.purch_dt as refdate, one.* from one as ref inner join one on ref.id = one.id and ref.name = one.name and ref.price = one.price where one.purch_dt>=ref.purch_dt and one.purch_dt<=ref.purch_dt+60;

Then subset and filter out the rows which do not meet the criteria of being part of a multiple:

create view three as select * from two group by id, name, price, refdate having sum(many)>1;

Finally, get rid of duplicates:

create table outcome as select distinct id, name, price, purch_dt, many from three order by id, purch_dt;

quit;

Results:

Obs id name price purch_dt many

1 1 Cup 10 10/02/2003 1 2 1 Cup 10 10/22/2003 2 3 2 Pencil 2 10/12/2003 1 4 2 Pencil 2 10/24/2003 1 5 4 Widget 99 07/18/2006 1 6 4 Widget 99 08/07/2006 1 7 4 Widget 99 08/27/2006 1 8 5 Widget 99 07/18/2006 1 9 5 Widget 99 08/27/2006 1 10 5 Widget 99 10/26/2006 1

On Fri, 6 Aug 2004 18:04:38 -0400, LL Cool Mars <klasikjis@YAHOO.COM> wrote:

>Hi SAS maniacs, > >Here is the light problem: > >data sample; >input id name $ price purch_dt; >format purch_dt mmddyy10.; >datalines; >1 Cup 10 16000 >1 Cup 10 15980 >1 Cup 10 16000 >1 Pencil 2 16002 >2 Pencil 2 16002 >2 Pencil 2 15990 >2 Pencil 2 15000 >2 Pencil 3 16000 >2 Pencil 1 16005 >3 Book 6 15999 >3 Book 4 16000 >3 Book 5 16000 >3 Book 4 15900 >; >run; > >proc sort data = sample; >by id name price purch_dt; >quit; > >would like to get list of items where id, name, and price are equal AND >pruch_dt within 60 days. I am looking for the solution in one step with >outcome looking like this: >1 Cup 10 16000 >1 Cup 10 15980 >1 Cup 10 16000 >2 Pencil 2 16002 >2 Pencil 2 15990 > > >if anybody is interested to take a look at what I came up with here it is, >but I believe this is clumsy: > >data outcome; >set sample; > if id = LAG(id) and > name = LAG(name) and > price = LAG(price) and > ABS(INTCK('day',purch_dt,LAG(purch_dt)))<60 then output; >run; > >proc sort data = outcome; >by id name price purch_dt; >quit; > > >data merged; >merge outcome (in=in1) sample(in=in2 RENAME =(purch_dt=orig_purch_dt)); >by id name price; >if in1 and in2 and ABS(INTCK('day',purch_dt, orig_purch_dt))<60 then output >merged; >run; > >Thanks much, >LL Cool Mars


Back to: Top of message | Previous page | Main SAS-L page