| Date: | Fri, 9 Jul 2010 17:35:40 -0400 |
| Reply-To: | msz03@albany.edu |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Mike Zdeb <msz03@ALBANY.EDU> |
| Subject: | Re: Merging |
| Content-Type: | text/plain;charset=iso-8859-1 |
hi ... just expanding on this posting (though I may be
wandering out of my expertise realm with SQL !!!)
1/ you can try BETWEEN with either ON or WHERE (useful for dates within intervals)
2/ as Sterling asks, do you want a SUM of doses when there are multiple matches,
do you want to see all of them, etc.
3/ full join (below) gives you only matches
will there ever be a date in data set B that's not in a data set A interval
if so, a right join (with data set B listed last) will give you
matched from B plus unmatched from B
* your data;
data a;
input sid (start end) (: mmddyy.) (start_ampm end_ampm) (: $2.) dose;
format start end mmddyy10.;
datalines;
1050 03/17/06 03/23/06 am pm 5
1050 03/24/06 03/30/06 am pm 10
1050 03/31/06 04/07/06 am pm 10
1050 04/08/06 04/28/06 am pm 10
1050 04/08/06 04/28/06 am pm 10
1050 04/08/06 04/28/06 am pm 10
1050 04/29/06 . am . 10
1050 04/29/06 . am . 10
1050 04/29/06 . am . 10
;
run;
data b;
input sid sample_date : mmddyy. flag : $1.;
format sample_date mmddyy10.;
datalines;
1050 04/07/06 Y
1050 04/27/06 .
;
run;
* all doses;
proc sql;
create table match1 as
select *
from a (where=(end is not missing)), b
where a.sid eq b.sid and sample_date between start and end;
quit;
* sum of doses;
proc sql;
create table match2 (drop=dose) as
select distinct *, sum(dose) as total_dose
from a (where=(end is not missing)), b
where a.sid eq b.sid and sample_date between start and end
group a.sid, start, end;
quit;
proc print data=match1;
run;
start_ sample_
Obs sid start end ampm end_ampm dose date flag
1 1050 03/31/2006 04/07/2006 am pm 10 04/07/2006 Y
2 1050 04/08/2006 04/28/2006 am pm 10 04/27/2006
3 1050 04/08/2006 04/28/2006 am pm 10 04/27/2006
4 1050 04/08/2006 04/28/2006 am pm 10 04/27/2006
proc print data=match2;
run;
start_ sample_ total_
Obs sid start end ampm end_ampm date flag dose
1 1050 03/31/2006 04/07/2006 am pm 04/07/2006 Y 10
2 1050 04/08/2006 04/28/2006 am pm 04/27/2006 30
--
Mike Zdeb
U@Albany School of Public Health
One University Place (Room 119)
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475
> These types of merges can be really tricky with datasteps, but are a breeze
> with SQL.
>
> Try:
>
> proc sql;
> create table combined as select b.sid, b.sample_date, a.dose from
> A
> left join
> B
> on
> (a.sid = b.sid and a.sample_date <= b.end and a.sample_date >= b.start);
> quit;
>
> You'll have to decide what you want to do with multiple doses on the same
> day (do they count once, or do they need to be summed?) and what to do with
> open end dates.
>
> -Sterling
>
> On Fri, Jul 9, 2010 at 12:36 PM, saigovind chenna <
> saigovind.chenna@gmail.com> wrote:
>
>> Hello All
>>
>> I have two datasets one is A and another is B
>>
>> *Dataset A*
>>
>> sid start end start(ampm) end(ampm)
>> dose
>>
>> 1050 03/17/06 03/23/06 am pm
>> 5
>> 1050 03/24/06 03/30/06 am pm
>> 10
>> 1050 03/31/06 04/07/06 am pm
>> 10
>> 1050 04/08/06 04/28/06 am pm
>> 10
>> 1050 04/08/06 04/28/06 am pm
>> 10
>> 1050 04/08/06 04/28/06 am pm
>> 10
>> 1050 04/29/06 am
>> 10
>> 1050 04/29/06 am
>> 10
>> 1050 04/29/06 am
>> 10
>>
>> *Dataset B*
>>
>> sid sample date flag
>>
>> 1050 04/07/06 Y
>> 1050 04/27/06
>> 1050
>> 1050
>>
>> I want to get the correct dose information (from dataset A) to the dataset
>> B
>> according to corresponding date.
>> For example
>> in the above case sample date in dataset B is 04/07/06...this falls in
>> between the dates (03/31/06-04/07/06) so should have a dose of
>> 10 corresponding to that.
>>
>> Can anyone help me here with sas code?
>>
>> Thanks
>> Sai
>>
>
|