```Date: Fri, 5 Dec 2008 03:10:04 -0600 Reply-To: torche@i-minds.be Sender: "SAS(r) Discussion" From: Francois Torche Subject: Re: Efficiency Question Content-Type: text/plain;charset=utf-8 The problem when processing very large datasets is the I/O. The more you can keep in memory, the faster it will be. If you have enough RAM, you might try to load one (or both?) table(s) in a hash table... François ----- Original Message ----- From: "Rob" Sent: Thu, December 4, 2008 19:15 Subject:Efficiency Question I have several large datasets (multi-million records) that I need to find matching records across. The following code works, but I am hoping that someone might be able to help with speed and efficiency. Thanks in advance... ****************** CODE ****************** data a; input var1 var2; cards; 1 1 2 2 3 3 4 4 5 5 6 6 7 7 ; run; data b; input var1 var2 var3; cards; 7 0 0 6 7 3 1 2 1 2 2 8 ; run; proc sort data=a; by var1 var2; run; proc sort data=b; by var1 var2 var3; run; /* The goal is to compare datasets A and B and identify where EITHER case1 OR case2 are met: case1 - If var1 in dataset A = var1 in dataset B AND var2 in dataset A = var2 in dataset B OR case2 - If var1 in dataset A = var1 in dataset B AND var2 in dataset A = var3 in dataset B THEN - take action - */ data c; merge a (in=in_a) b (in=in_b rename=(var2=b_var2)); by var1; format match_var 1.; match_var = 0; if in_a & in_b then do; if var2 = var3 then match_var = 3; *if the variables have different names, then it works; else if var2 = b_var2 then match_var = 2; *if the variables have the same name, then one must be renamed; end; if in_a then output; run; ----- End of original message ----- ```

Back to: Top of message | Previous page | Main SAS-L page