LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2002, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 2 Oct 2002 19:29:06 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: PROC SQL, union, PROC SORT, nodup
Comments: To: Talbot Katz <TopKatz@MSN.COM>
Content-Type: text/plain; charset="iso-8859-1"

The embedded UNION query eliminates duplicates in (ds_a OR ds_b) before the implicit inner join with ds_k. SET ds_a ds_b does not. Had you replaced UNION CORR with OUTER UNION CORR, the UNION and SET would produce the same results. The JOIN will as well if ds_k contains no duplicates of duplicated keys in the OUTER UNION.

The SORT NODUPLICATES after the MERGE operates on a dataset that may combine variables and values from two datasets, whereas SAS SQL does not save 2nd or later instances of the same column variable name. The difficulty of explaining what actually happens in a MERGE of two datasets that have duplicates of BY variables and duplicates of variable labels on both sides has led some SAS data step experts to say 'NEVER do that'.

Sig

-----Original Message----- From: Talbot Katz [mailto:TopKatz@MSN.COM] Sent: Wednesday, October 02, 2002 5:31 PM To: SAS-L@LISTSERV.UGA.EDU Subject: PROC SQL, union, PROC SORT, nodup

Hi, Gang!

Yet another shaggy dog story from y.t. I have three data sets, call them ds_a, ds_b, and ds_k; ds_a and ds_b have the identical layout, and ds_k shares a common key with them. I want to create a data set which chooses some records from ds_a, some from ds_b, and matches them up against ds_k. Suppose they are already all sorted by the common key; then I could do this with two data steps :

data ds_c; set ds_a ds_b; by common_key; run; data ds_m; merge ds_c (in = c) ds_k (in = k); by common_key; if c and k; run;

When I started this task, the data sets weren't pre-sorted, so I took the lazy route, and tried to do the task with PROC SQL, as follows:

proc sql ; create table ds_m as select * from (select * from ds_a union corr select * from ds_b) as do, ds_k as dc where do.common_key = dc.common_key ;

quit ;

Lo and behold, the second method produced fewer output records than the first method did. I tried PROC SORT with NODUPLICATES on both output data sets, but no records were eliminated from either one. After fooling around for most of the day with little success, I finally found a group of four records in ds_b which looked identical to my naked eye (although PROC SORT with NODUPLICATES wouldn't collapse them at all), but showed up in differing amounts in the two output data sets. They satisfied a condition, which we'll call condition4; none of the records in ds_a satisfy condition4. I put these four records in a dataset called ds_4 and I got the following results :

proc sql ; create table ds_m4 as select * from (select * from ds_4) as do, ds_k as dc where do.common_key = dc.common_key and condition4 ;

quit ;

This resulted in a four record output data set, ds_m4. Then I tried :

proc sql ; create table ds_m4 as select * from (select * from ds_a union corr select * from ds_4) as do, ds_k as dc where do.common_key = dc.common_key and condition4 ;

quit ;

This resulted in a ONE record output data set, ds_m4. Somehow the "UNION CORR" condition caused duplicate elimination to take place in PROC SQL, even though PROC SORT could not expose the records as duplicates.

Can anyone tell me what's going on? Is the PROC SORT NODUPLICATES option defective? Is "UNION CORR" supposed to eliminate duplicates? Is there a way of doing this with PROC SQL that wouldn't eliminate duplicates?

Thanks!

-- TMK --


Back to: Top of message | Previous page | Main SAS-L page