Date: Sat, 12 Feb 2011 21:05:51 -0500
Reply-To: Tom Abernathy <tom.abernathy@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Tom Abernathy <tom.abernathy@GMAIL.COM>
Subject: Re: Merging Datasets
John -
I really looks like you want to use a SET statement in a data step or a
UNION in PROC SQL. Think of it like adding more rows to a table. If you
want to make new columns in the resulting table then you will need to rename
some of the columns (variables) before you match-merge them.
Try
data both;
set yr1.ds1 yr1.ds2;
by patientid;
run;
You should end up with 21089 + 57955 = 79044 observations.
- Tom
On Sat, 12 Feb 2011 15:07:49 -1000, John Parker <John.V.Parker@KP.ORG>
wrote:
>Here is the rsult for the first one....
>9917 PROC SQL;
>9918 CREATE TABLE yr1.Both AS
>9919 SELECT a.*, b.*
>9920 FROM yr1.DS1 AS a
>9921 FULL JOIN
>9922 yr1.DS2 AS b
>9923 ON a.patientid=b.patientid;
>
>WARNING: Variable Abn_Ind already exists on file YR1.BOTH.
>WARNING: Variable INTERVAL already exists on file YR1.BOTH.
>WARNING: Variable LOCAL_CD already exists on file YR1.BOTH.
>WARNING: Variable RESULT_UNIT already exists on file YR1.BOTH.
>WARNING: Variable SITEID already exists on file YR1.BOTH.
>WARNING: Variable TEST_TYPE already exists on file YR1.BOTH.
>WARNING: Variable TEST_RSLT_TEXT already exists on file YR1.BOTH.
>WARNING: Variable PATIENTID already exists on file YR1.BOTH.
>WARNING: Variable LAB_DT already exists on file YR1.BOTH.
>WARNING: Variable LAB_TM already exists on file YR1.BOTH.
>WARNING: Variable TEST_RSLT_NUM already exists on file YR1.BOTH.
>WARNING: Variable normal_high already exists on file YR1.BOTH.
>WARNING: Variable normal_low already exists on file YR1.BOTH.
>NOTE: Compressing data set YR1.BOTH decreased size by 57.76 percent.
> Compressed is 6504 pages; un-compressed would require 15399 pages.
>NOTE: Table YR1.BOTH created, with 739109 rows and 13 columns.
>
>9924 QUIT;
>
>None of the lab data from DS2 was added to the final data set
>
>
>
>"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> wrote on 02/12/2011 01:41:55
>PM:
>
>> John,
>>
>> No! If you are trying to merge according to by variables, you have to
>use a
>> by statement. However, a datastep merge is not particularly well suited
>for
>> a many-to-many merge like you describe you have.
>>
>> A full join in proc sql would be a lot better.
>>
>> Take a look at: http://www2.sas.com/proceedings/forum2007/071-2007.pdf
>>
>> HTH,
>> Art
>> -------
>> On Sat, 12 Feb 2011 13:33:31 -1000, John Parker <John.V.Parker@KP.ORG>
>> wrote:
>>
>> >Hi,
>> >I should be able to figure this out but I'm not having any luck.
>> >I have two data sets that I'm trying to merge.
>> >I want all of the data from both data sets to be in the new data set.
>> >Both data sets contain multiple instances of the same patientid.
>> >I have tried using a by statement but it gives me an error letting me
>know
>> >that
>> >
>> >MERGE statement has more than one data set with repeats of BY values.
>> >from what I understand about the merge statement, if you want the data
>> >from both data sets you don't use a by statement.
>> >The variables in both data sets are the same name, length and type.
>> >This is what I have:
>> >
>> >data yr1.new_data;
>> >merge yr1.ds1 yr1.ds2;
>> >run;
>> >Basically the two data sets are not merging.
>> >This is the note I get in the log
>> >
>> >NOTE: There were 21089 observations read from the data set YR1.DS1.
>> >NOTE: There were 57955 observations read from the data set YR1.DS2.
>> >NOTE: The data set YR1.new_data has 57955 observations and 13
>variables.
>> >
>> >It contains all of the data from DS2 not DS1.
>> >
>> >So, what am I doing wrong?
>> >
>> >Thanks,
>> >
>> >John
|