LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (November 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 14 Nov 2005 19:42:50 +0000
Reply-To:   iw1junk@COMCAST.NET
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ian Whitlock <iw1junk@COMCAST.NET>
Subject:   Re: How to make repeats in merge-by-group to work?
Comments:   cc: "Michael S. Zdeb" <msz03@HEALTH.STATE.NY.US>, Anonymous user <Number_42@CARAMAIL.COM>

Anonymous user <Number_42@CARAMAIL.COM> asked how to do Cartesian by processing without SQL. This is a valid question and perhaps even important if one is trying to learn various techniques, so examples of how easy it is in SQL just doesn't cut it.

In fact, there was an article published in the elite but now defunct "Observations" showing that the POINT= strategy was far more efficient than using SAS SQL as presented in some early version of version 6. I think I posted a suggestion to SAS-L or possibly wrote a coders corner paper showing how the authors failed to get the most efficient advantage out of their technique.

Michael posted the solution in the message below. I changed ID=b to ID=d in ONE and got

339 * merge the POINTER data sets; 340 * use the point information to cycle through observations with matching 341 IDs; 342 343 data one_two (keep=id fruit color); 344 merge one_point two_point; 345 by id; 346 do i = one_start to one_end; 347 set one point=i; 348 do j = two_start to two_end; 349 set two point=j; 350 output; 351 end; 352 end; 353 run; ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid. one_start=. id=b one_end=. two_start=2 two_end=3 FIRST.id=1 LAST.id=1 i=. fruit=apple j=2 color=amber _ERROR_=1 _N_=2 NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 2 observations read from the data set WORK.ONE_POINT. NOTE: There were 3 observations read from the data set WORK.TWO_POINT. WARNING: The data set WORK.ONE_TWO may be incomplete. When this step was stopped there were 2 observations and 3 variables. WARNING: Data set WORK.ONE_TWO was not replaced because this step was stopped.

A corrected version of the last step is

data one_two (keep=id fruit color); length fruit color $ 8 ; merge one_point two_point; by id; do i = max(one_start,0) to max(one_end,0); if i = 0 then fruit = " " ; else set one point=i; do j = max(two_start,0) to max(two_end,0); if j = 0 then color = " " ; else set two point=j; output; end; end; run;

In a more complex setting one could create BLANK_ONE and BLANK_TWO with one record each of the relevan subsidary variables. Then one could use retain pt 1 ; and if i = 0 then set blank_one point = pt ; to replace if i = 0 then fruit = " " ;

The blanking is important becasue SAS does not automatically set variables from a SAS dataset to missing when either the SET statement fails or is not executed.

Ian Whitlock ================= Date: Tue, 8 Nov 2005 10:14:26 -0500 Reply-To: "Michael S. Zdeb" <msz03@HEALTH.STATE.NY.US> Sender: "SAS(r) Discussion" From: "Michael S. Zdeb" <msz03@HEALTH.STATE.NY.US> Subject: Re: How to make repeats in merge-by-group to work? Comments: To: Number_42@CARAMAIL.COM In-Reply-To: <1131450900.655635.242990@g43g2000cwa.googlegroups.com> Content-type: text/plain; charset=US-ASCII Hi...I agree with Toby Dunn...use PROC SQL. But, for those of us who used SAS prior to V6, there was no PROC SQL and many-to-many merges still had to be done with a data step. For those of us who remember NESUG '88 (or for that matter, 1988), the proceedings contain a paper by Robina Thornton and Merry Rabb on "Advanced SET and MERGE Processing" that shows how to do many-to-many merging in a data step merge. The method does what PROC SQL does in the background, i.e. it ensures that all observations in data sets one and two with the same ID "see" each other at some point in the data step processing. So, here it is (from NESUG '88) with the posted data and it does give the desired result...

data one; input id $ fruit $; datalines; a apple a apple b banana c coconut c coconut c coconut ; proc sort data=one; by id; run; data two; input id $ color $; datalines; a amber b brown b black c cocoa c cream ; proc sort data=two; by id; run; * create POINTERS for both data sets - reduces the data step merge to one-to-one merging; data one_point (keep=id one_start one_end); retain one_start; set one; by id; if first.id then one_start = _n_; if last.id then do; one_end = _n_; output; end; run; data two_point (keep=id two_start two_end); retain two_start; set two; by id; if first.id then two_start = _n_; if last.id then do; two_end = _n_; output; end; run; * merge the POINTER data sets; * use the point information to cycle through observations with matching IDs; data one_two (keep=id fruit color); merge one_point two_point; by id; do i = one_start to one_end; set one point=i; do j = two_start to two_end; set two point=j; output; end; end; run; proc print data=one_two; run; Gives... Obs id fruit color 1 a apple amber 2 a apple amber 3 b banana brown 4 b banana black 5 c coconut cocoa 6 c coconut cream 7 c coconut cocoa 8 c coconut cream 9 c coconut cocoa 10 c coconut cream

PS Another posted solution... DATA test(drop=id2 ); set one; do i=1 to nn; set two(rename=(id=id2)) nobs=nn point=i; if id=id2 then output; end; RUN; also works but requires a complete pass through data set two for each observation in data set one rather than only looking within observations with the same ID. Mike Zdeb U@Albany School of Public Health 1 University Drive Rensselaer, NY 12144-3456 (P)518-402-6479 (F)630-604-1475 <snip>


Back to: Top of message | Previous page | Main SAS-L page