Date: Wed, 12 Mar 2008 15:21:18 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>"
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>"
Subject: Re: Two data sets with same variables, how to compare?
On Wed, 12 Mar 2008 14:04:26 -0400, Mike Zdeb <msz03@ALBANY.EDU> wrote:
>hi ... if both data sets have variables with the same names, the merge will
replace values in the
>first named data set with those in the second ... so, you are faced with
renaming all the
>variables in one of the data sets if you use merge
Notice the variable DATASET in the BY statement. It presumably has values
like 1 & 2 or A & B. These will never match, so the MERGE is in effect doing
an interleave. I find SET more straightforward for that, but either way can
After such an interleave, everything can be transposed in one step, with
DATASET as the ID variable.
Then Peter can easily (no arrays needed) derive the differences, and extract
flags or classifiers as needed from the _NAME_ values. The resulting data
set should support a lot of investigation. Somebody mentioned graphics. I
also think PROC TABULATE will give good mileage in terms of churning out a
variety of summaries.
Peter may not have heard of PROC TABULATE until now, but he has heard about
the advantages of normalization.
>if the data sets really do have all the same variables and if newname can
be used to merge the two
>... here's a similar situation and I think that starting with the first
PROC TRANSPOSE, it should
>work no matter how many variables or observations if in fact the two data
sets are identical
>except for having different values of all the variables ... remember,
TRANSPOSE with a BY variable
>expects sorted data ... no arrays, no renames ...
>input newname : $3. x y z;
>abc 10 20 30
>bcd 1 2 3
>input newname : $3. x y z;
>abc 11 21 31
>bcd 10 20 30
>proc transpose data=one out=tone (rename=(col1=time1));
>proc transpose data=two out=ttwo (rename=(col1=time2));
>merge tone ttwo;
>by newname _name_;
>diff = time2 - time1;
>proc transpose data=both out=tboth (drop=_name_) pfrefix=diff_;
>proc print data=tboth;
>U@Albany School of Public Health
>One University Place
>Rensselaer, New York 12144-3456
>> I already called SAS Tech support on this one, and they are working on
something.... but if anyone
>> has a quick solution, that would be great.
>> I have two datasets. Each dataset has the same people (with a variable
called NEWNAME) and the
>> same variables (about 100 variables). All these variables are numeric,
and they have the same
>> variable names in the two datasets. They are measurements of the same
things on two devices.
>> There are only 6 people
>> We want to compare the scores on the different variables from the two
>> I can easily print out the scores on each measure for each person:
>> data both;
>> merge nineteensix2 comb18six;
>> by newname dataset;
>> and then
>> proc print data = both;
>> var newname dataset MFF1A MRF1A MAF1B MFF1B MRF1B MAF1D MRF1D MIF1F2A
>> CoF1F2B CoF1F2D MIF1F2S CoF1F2S CoF1F2T MAF1S MFF1S
>> MAF1T MFF1T MRF1T MAF2A MFF2A MRF2A MAF2B MFF2B MRF2B MAF2D MFF2D
>> MFF2S MAF2T MFF2T MRF2T MAF7A MFF7A MRF7A MAF7B MFF7B MRF7B
>> MAF7D MRF7D MIF7F8A CoF7F8A MIF7F8B CoF7F8B MIF7F8D MIF7F8S
>> CoF7F8S MIF7F8T CoF7F8T MAF7S MFF7S MAF7T MFF7T MRF7T MAF8A MFF8A
>> MFF8B MRF8B MAF8D MFF8D MRF8D MAF8S MFF8S MAF8T MFF8T MRF8T MRFZB MAFZD
>> MFFZD MRFZD MAFZS MFFZS MAFZT MFFZT MRFZT;
>> but my boss would now like to see the differences, and also see the
differences sorted in
>> different ways (e.g. for variables that start with MFF vs. MAF and so on).
>> Any ideas?
>> The SAS tech person said she had soemthing combining SQL and arrays and
some way to dynamically
>> label things.... but she didn't say when she would send it. So, don't go
crazy trying to solve
>> it, but if there's some neat way...that would be great
>> Statistical Consultant
>> www DOT peterflom DOT com