Date: Fri, 31 May 2002 16:06:33 -0400
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: HELP - Merging by time ranges
Content-Type: text/plain; charset="iso-8859-1"
Chris,
SQL is a great help in doing this sort of thing. Here is code to produce
all
possible matches based on matching SER_NUM and REF_DES.
data w1 ;
input ser_num ref_des datetime1 :datetime. ;
cards ;
12345 10 16APR02:14:06:15
12345 10 16APR02:14:07:15
12345 56 17APR02:16:00:00
21213 25 15APR02:12:11:10
21213 30 15APR02:13:11:10
11111 11 15APR02:13:11:10
;
data w2 ;
input ser_num ref_des datetime2 :datetime. ;
cards ;
12345 10 16APR02:14:16:15
12345 10 16APR02:14:17:15
12345 56 17APR02:23:10:00
21213 25 15APR02:12:21:10
21213 30 15APR02:13:21:10
22222 22 15APR02:13:11:10
;
proc sql ;
create table q ( label = "possible matches" ) as
select coalesce ( w1.ser_num , w2.ser_num ) as ser_num
, coalesce ( w1.ref_des , w2.ref_des ) as ref_des
, datetime1 format = datetime.
, datetime2 format = datetime.
, datetime2 - datetime1 as diff format time.
from w1 full join w2
on w1.ser_num = w2.ser_num
and w1.ref_des = w2.ref_des
;
quit ;
It should be a simple task to identify the non-matches. When a possible
match isn't realized because of the time restriction it should be easy to
split into two records. What will not be easy is to capture why you chose
one pair of matches over all the other possible matches for SER_NUM = 12345
and REF_DES = 10. Once you can state algorithmically the reason extra
requirement, somone should be able to implement it.
Whether this is a practical approach or not really will depend on the amount
of data and how many irrelevant matches are generated.
IanWhitlock@westat.com
-----Original Message-----
From: Chris Biggs [mailto:chris.biggs@ALCATEL.COM]
Sent: Friday, May 31, 2002 3:13 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: HELP - Merging by time ranges
--------------1BD613B6619D5565F21123B8
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Additional details:
dset1...:
ser_num ref_des datetime1
======= ====== ==============
12345 10 16APR02:14:06:15
12345 10 16APR02:14:07:15
12345 56 17APR02:16:00:00
21213 25 15APR02:12:11:10
21213 30 15APR02:13:11:10
dset2...:
ser_num ref_des datetime2
======= ====== ==============
12345 10 16APR02:14:16:15
12345 10 16APR02:14:17:15
12345 56 17APR02:23:10:00
21213 25 15APR02:12:21:10
21213 30 15APR02:13:21:10
desired merged dataset...:
ser_num ref_des datetime1
datetime2
======= ====== =============== ==============
12345 10 16APR02:14:06:15
16APR02:14:16:15
12345 10 16APR02:14:07:15
16APR02:14:17:15
12345 56 17APR02:16:00:00
no match
12345 56 no
match 17APR02:23:10:00
21213 25 15APR02:12:11:10
15APR02:12:21:10
21213 30 15APR02:13:11:10
15APR02:13:21:10
Chris Biggs wrote:
> Help!
>
> I've been banging my head against my desk for a day now trying to make a
> merge work. I have two datasets that I need merged together by two key
> variables (ser_num and ref_des) in a many to many merge -- keeping the
> non-merged records also leaving the additional variables NULL. The
> catch is that it isn't a clean merge... the match between the two key
> variables can only happen within a certain datetime range -- in this
> case, 2 hours apart -- but the datetime variables between the two
> datasets are named differently and must remain that way.
>
> I've tried "by" statements using the two key variables and comparing the
> datetime range via a "where" clause but no luck based on the different
> datatime variable names.
>
> Can anyone help with this logic and what statements to use?
>
> Thx!
> Chris
--------------1BD613B6619D5565F21123B8
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<b>Additional details:</b>
<p>dset1...:
<p>ser_num
ref_des &nb
sp;
datetime1
<br>=======  
;
====== &nbs
p;
==============
<br>12345 &
nbsp;
10 &n
bsp;
16APR02:14:06:15
<br>12345 &
nbsp;
10 &n
bsp;
16APR02:14:07:15
<br>12345 &
nbsp;
56 &n
bsp;
17APR02:16:00:00
<br>21213 &
nbsp;
25 &n
bsp;
15APR02:12:11:10
<br>21213 &
nbsp;
30 &n
bsp;
15APR02:13:11:10
<p>dset2...:
<p>ser_num
ref_des &nb
sp;
datetime2
<br>=======  
;
====== &nbs
p;
==============
<br>12345 &
nbsp;
10 &n
bsp;
16APR02:14:16:15
<br>12345 &
nbsp;
10 &n
bsp;
16APR02:14:17:15
<br>12345 &
nbsp;
56 &n
bsp;
17APR02:23:10:00
<br>21213 &
nbsp;
25 &n
bsp;
15APR02:12:21:10
<br>21213 &
nbsp;
30 &n
bsp;
15APR02:13:21:10
<br>
<p>desired merged dataset...:
<p>ser_num
ref_des
datetime1 &
nbsp;
datetime2
<br>=======  
;
======
===============
==============
<br>12345 &
nbsp;
10 &n
bsp;
16APR02:14:06:15 16APR02:14:16:15
<br>12345 &
nbsp;
10 &n
bsp;
16APR02:14:07:15 16APR02:14:17:15
<br>12345 &
nbsp;
56 &n
bsp;
17APR02:16:00:00
no match
<br>12345 &
nbsp;
56 &n
bsp;
no
match  
;
17APR02:23:10:00
<br>21213 &
nbsp;
25 &n
bsp;
15APR02:12:11:10 15APR02:12:21:10
<br>21213 &
nbsp;
30 &n
bsp;
15APR02:13:11:10 15APR02:13:21:10
<br>
<br>
<p>Chris Biggs wrote:
<blockquote TYPE=CITE>Help!
<p>I've been banging my head against my desk for a day now trying to make
a
<br>merge work. I have two datasets that I need merged together by
two key
<br>variables (ser_num and ref_des) in a many to many merge -- keeping
the
<br>non-merged records also leaving the additional variables NULL.
The
<br>catch is that it isn't a clean merge... the match between the two key
<br>variables can only happen within a certain datetime range -- in this
<br>case, 2 hours apart -- but the datetime variables between the two
<br>datasets are named differently and must remain that way.
<p>I've tried "by" statements using the two key variables and comparing
the
<br>datetime range via a "where" clause but no luck based on the different
<br>datatime variable names.
<p>Can anyone help with this logic and what statements to use?
<p>Thx!
<br>Chris</blockquote>
</html>
--------------1BD613B6619D5565F21123B8--