Date: Sat, 26 Apr 2003 08:40:36 -0400 Reply-To: "Richard A. DeVenezia" Sender: "SAS(r) Discussion" From: "Richard A. DeVenezia" Subject: Re: Confusing results from a simple merge Curt: The confusing results are due to the repeated by groups and 'special logic' coincendence. There are 100 by-groups (id1=1..20 id2=1..5) In table A, each by-group has 7 observations (id3=1..7) In table B, each by-gruop has 4 observations (id4=1..4) When you merge dy id1 id2, the first 4 observations of each tables by group are lined up. During the first 4 observations of each by group id3 ^= 5 is met, and thus b2 is set to missing. After the 4th observation from table A all values from table B are effectively 'locked' until explicitly assigned or the next by group starts. During the 5th observation from table A's by group the id3^=5 is false and the if falls through and b2 remains missing (since it is 'locked'). During the 6th and 7th observations from table A's by group id3^=5 is again true and b2 is explicitly set to missing. Moral: BEWARE REPEATED OBSERVATIONS IN BY GROUPS. Here is a data step to show more explicitly what happened data ab; * Merge A and B here; data foo; merge a b ; by id1 id2; if id3 ^= 5 then do; b2was=b2; b2wassetmissing=1; b2=.; if id3=5 then put 'ERROR!'; end; run; -- Richard A. DeVenezia, http://www.devenezia.com "Curt Seeliger" wrote in message news:200304251649.10310.seeliger.curt@epa.gov... > Folks, > > It's late friday afternoon and most of you have returned home safely. > > I have two datasets A and B with by-variables id1 and id2 in common, and > I'd like to copy the variables b1-b3 from B to A, and have them be > non-missing only at specific records (where id3=5, in this example). > Code which *should* do this is listed below. I've been able to get it to > run by splitting the merging data step -- those statements are commented > out, along with some other operations I'd be doing with the original > 'real' data. > > The problem is that b2 is set to missing regardless of the value of id3. > Separating the datastep by uncommenting out the line of statements causes > the code to run as intended. The second odd thing about this is that the > ERROR message is never printed, indicating that the IF statement is > working correctly. It's as if b2 is being set to missing outside the IF > statement. > > Maybe the biggest problem is that I'm missing something really obvious. > Thanks for any clues, > > data a; * Two fake data sets; > do id1=1 to 20; > do id2=1 to 5; > do id3=1 to 7; > a1=id1; a2=id1*id2; > output; > end; > end; > end; > run; > > data b; > do id1=1 to 20; > do id2=1 to 5; > do id4=1 to 4; > b1=id1*id2; b2=id1*id2+id4; b3=id4; > output; > end; > end; > end; > run; > > data ab; * Merge A and B here; > merge a/*(in=in_a)/**/ > b/* (where=(id4=1)) /**/ > ; > by id1 id2; > *if in_a; > /*run; data ab; set ab; /* Split the data step, or don't */ > *drop id4; > if id3^=5 then do; > b2=.; if id3=5 then put 'ERROR!'; > end; > run; > proc print data=ab uniform; * wonder why? I do. ; > run; > > The only explaination I'm coming up with is that the DSV isn't been loaded > with the data from B for records with id3>4 (which is the number of id4 > values in B) until the dataset AB is created. (You might need to run the > code to see what I mean, checking for id3^=4 instead). This makes the > problem somewhat simpler: Why isn't the DSV completed until after the > output dataset is written? > > cur > > -- > Curt Seeliger, Data Ranger > CSC, EPA/WED contractor > 541/754-4638 > seeliger.curt@epa.gov

Back to: Top of message | Previous page | Main SAS-L page