Date: Thu, 3 Nov 2005 11:25:13 -0800
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: Compare two tables - Modified
In-Reply-To: <200511031744.jA3HVQgY008665@malibu.cc.uga.edu>
Content-Type: text/plain; charset=iso-8859-1
--- SUBSCRIBE SAS-L Chandra Gadde <ddraj2015@GMAIL.COM> wrote:
> Hi All,
>
> Here I am simplyfing my problem. I have two tables t1 and t2; Here is
> the
> SAS code.
>
> data t1;
> input x y;
> cards;
> 11 111
> 22 222
> 33 333
> 44 444
> 55 555
> 66 666
> ;
> run;
>
> data t2;
> input x y;
> cards;
> 22 222
> 44 444
> 66 666
> 11 110
> 33 330
> 54 553
> ;
> run;
>
> proc sort data = t1;
> by x y;
> run;
> proc sort data = t2;
> by x y;
> run;
>
> proc compare base = t1 compare = t2 brief outnoequal;
> run;
>
> The output for this is ..........
>
>
>
> The COMPARE Procedure
> Comparison of WORK.T1 with WORK.T2
> (Method=EXACT)
>
> NOTE: Values of the following 2 variables compare unequal: x y
>
>
> Value Comparison Results for Variables
>
>
> __________________________________________________________
> || Base Compare
> Obs || x x Diff. %
> Diff
> ________ || _________ _________ _________
> _________
> ||
> 5 || 55.0000 54.0000 -1.0000
> -1.8182
>
> __________________________________________________________
>
>
>
> The COMPARE Procedure
> Comparison of WORK.T1 with WORK.T2
> (Method=EXACT)
>
> Value Comparison Results for Variables
>
>
> __________________________________________________________
> || Base Compare
> Obs || y y Diff. %
> Diff
> ________ || _________ _________ _________
> _________
> ||
> 1 || 111.0000 110.0000 -1.0000
> -0.9009
> 3 || 333.0000 330.0000 -3.0000
> -0.9009
> 5 || 555.0000 553.0000 -2.0000
> -0.3604
>
> __________________________________________________________
>
>
> Here is my question.
>
> I just need to create this output into another dataset with 5 columns
> as
> shown here. Any help would be greatly appreciated. I have been trying
> to
> achieve this but couldn't succed. Please help me out.
>
> Row t1_Colname t2_Colname t1_value t2_value
> 5 x x 55 54
> 1 y y 111 110
> 3 y y 333 330
> 5 y y 555 553
>
> Thanks,
>
Chandra,
I thought to use an ODS OUTPUT statement with PROC COMPARE. But
if you do that, you do not get an output data set which has the
fields you would like. Instead, you get an output data set which
has all of the above information (and more) in a single text
variable. That text variable would have to be parsed. It could
be done, but the following approach is simpler for me to code
and generates what you display above.
Since you stated previously that you have multiple columns with the
same names in your two data sets plus some additional columns on
one or the other set which differ, I will write general purpose code
that assumes no knowledge of the names of the variables in the
data sets to be compared. So, we first use proc contents to
find the names of the variables in each data set. We subsequently
merge the contents to construct a data set which identifies
variables common to both data sets.
We then use the list of common variables to write code which
1) merges the two data sets
2) renames the common variables in each data set so that
the names are unique and one set does not overwrite the
other
3) loops over the renamed variables comparing their values
for the common variables
4) writes out the required information when the values of
any two variables which should be the same differ between
the two sets
/* Determine names of variables in each data set */
proc contents data=t1 out=t1_Colnames noprint;
proc contents data=t2 out=t2_Colnames noprint;
run;
/* Determine variables which are common to both data sets */
data Vboth;
merge t1_Colnames(in=a)
t2_Colnames(in=b);
by name;
if a & b;
run;
/* Now write code which */
/* 1) merges the two data sets, */
/* 2) renames the variables during the merge */
/* 3) loops over the renamed variables comparing their values */
/* 4) write out the required information for values which differ */
data _null_;
set Vboth end=lastrec;
array vars {100000} $ 32 _temporary_;
vars{_n_} = name;
if lastrec then do;
/* Start writing code. */
call execute("data diffs;");
call execute(" length Row 3");
call execute(" t1_Colname t2_Colname $ 32");
call execute(" t1_value t2_value 8;");
/* Merge data sets and perform variable renames */
call execute(" merge t1(in=a rename=(");
do i=1 to _n_;
/* variable renames for first data set*/
call execute(vars{i} || compress("=t1_"||vars{i}));
end;
call execute(" ))");
call execute(" t2(in=a rename=(");
do i=1 to _n_;
/* variable renames for second data set*/
call execute(vars{i} || compress("=t2_"||vars{i}));
end;
call execute(" ));");
/* Construct arrays of common variables from each data set */
call execute(" array t1_vals {*} "); /* Data set 1 list */
do i=1 to _n_;
call execute(compress("t1_"||vars{i}));
end;
call execute(" ;");
call execute(" array t2_vals {*} "); /* Data set 2 list */
do i=1 to _n_;
call execute(compress("t2_"||vars{i}));
end;
call execute(" ;");
call execute(" Row=_n_;");
/* Loop over common variables and check for equality */
call execute(" do i=1 to dim(t1_vals);");
call execute(" if t1_vals{i}^=t2_vals{i} then do;");
/* Inequality found for i-th variable in list */
/* Return name of i-th variable */
call execute(" t1_Colname=substr(vname(t1_vals{i}),4);");
call execute(" t2_Colname=t1_Colname");
/* Return values of i-th variable from each data set
*/
call execute(" t1_value=t1_vals{i};");
call execute(" t2_value=t2_vals{i};");
/* Output record of difference */
call execute(" output;");
call execute(" end;");
call execute(" end;");
call execute(" keep Row t1_Colname t2_Colname t1_value
t2_value;");
call execute("run;");
/* Sort the data by variable and row in which difference is found
*/
call execute("proc sort data=diffs;");
call execute(" by t1_Colname Row;");
call execute("run;");
end;
run;
/* Now print our list of differences */
proc print data=diffs;
run;
If you submit this code and then look at the log file where all
of the call execute statements have been resolved, I think you
will better understand just what this code does.
HTH,
Dale
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
__________________________________
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com