| Date: | Tue, 4 Sep 2007 15:49:41 -0400 |
| Reply-To: | Richard Reeves <reeves@STUDENTCLEARINGHOUSE.ORG> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Richard Reeves <reeves@STUDENTCLEARINGHOUSE.ORG> |
| Subject: | Re: Many to many merge with dates |
| In-Reply-To: | A<200709041930.l84J3Mi1005826@mailgw.cc.uga.edu> |
| Content-Type: | text/plain; charset="us-ascii" |
Depending on the size of your data you may want to consider a variation
to the paper:
http://www8.sas.com/scholars/Proceedings/2006/ETL/ET07_06.PDF
It discusses the SyncJoin Algorithm for many to many merges. If you do
use a proc SQL explore the optimizer features and the buffersize _method
and _tree options.
Rich
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@listserv.vt.edu] On Behalf Of
Arthur Tabachneck
Sent: Tuesday, September 04, 2007 3:30 PM
To: SAS-L@LISTSERV.VT.EDU
Subject: Re: Many to many merge with dates
Since the data step can't handle many-to-many merges, I'd suggest a SQL
approach. Is something like the following what you are looking for?
data First_Dataset;
input ID @9 Firstdate date9.;
format Firstdate date9.;
cards;
1 20Jan2002
1 18Aug2002
1 30May2002
2 15Mar2002
2 20Apr2002
2 18May2002
;
data Second_Dataset;
input ID @9 Seconddate date9. type;
format Seconddate date9.;
cards;
1 16Jan2002 2
1 18Jan2002 3
1 24May2002 2
1 1Jun2002 2
1 1Aug2002 2
1 16Aug2002 2
1 30Aug2002 2
2 10Mar2002 3
2 12Mar2002 4
2 20Mar2002 4
2 16Apr2002 3
2 18Apr2002 3
2 28Apr2002 3
2 14May2002 3
2 16May2002 3
2 26May2002 3
;
proc sql;
create table want as
select * from
(select f.ID, Firstdate, Seconddate, type
from First_dataset as f
join Second_dataset as s
on f.id eq s.id
having Firstdate gt Seconddate and
Firstdate-20 LE Seconddate)
group by id,FirstDate
having Firstdate-Seconddate eq min(Firstdate-Seconddate);
quit;
Art
--------
On Tue, 4 Sep 2007 08:38:50 -0700, phrising@YAHOO.DK wrote:
>Dear fellow SAS-users,
>I am seeking a solution to the following problem. I have a dataset
>with multiple occurrences of the same ID and a firstdate
>
>ID Firstdate
>1 20Jan2002
>1 18Aug2002
>1 30May2002
>2 15Mar2002
>2 20Apr2002
>2 18May2002
>
>I have another dataset with an ID variable, a date and a type:
>
>ID Seconddate Type
>1 16Jan2002 2
>1 18Jan2002 3
>1 24May2002 2
>1 1Jun2002 2
>1 1Aug2002 2
>1 16Aug2002 2
>1 30Aug2002 2
>2 10Mar2002 3
>2 12Mar2002 4
>2 20Mar2002 4
>2 16Apr2002 3
>2 18Apr2002 3
>2 28Apr2002 3
>2 14May2002 3
>2 16May2002 3
>2 26May2002 3
>
>I want to end up with a dataset with the closest date (seconddate)
>that match the firstdate and the date has to come before the first
>date. The final dataset will look like this:
>
>ID Firstdate Type
>1 20Jan2002 3
>1 18Aug2002 2
>1 30May2002 3
>2 15Mar2002 4
>2 20Apr2002 3
>2 18May2002 3
>
>The problem is that ID occurs multiple times in both datasets. I
>probably also have to set up some kind of condition as to how far back
>I want to go to find the closest date:
>Firstdate-20 <= Seconddate <= Firstdate
>
>Any hints/tips as to how to do this appreciated.
|