LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 5 Dec 2008 03:10:04 -0600
Reply-To:     torche@i-minds.be
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Francois Torche <torche@I-MINDS.BE>
Subject:      Re: Efficiency Question
Content-Type: text/plain;charset=utf-8

The problem when processing very large datasets is the I/O. The more you can keep in memory, the faster it will be. If you have enough RAM, you might try to load one (or both?) table(s) in a hash table...

François

----- Original Message ----- From: "Rob" <data_sleuth_2006@yahoo.com> Sent: Thu, December 4, 2008 19:15 Subject:Efficiency Question

I have several large datasets (multi-million records) that I need to find matching records across. The following code works, but I am hoping that someone might be able to help with speed and efficiency.

Thanks in advance...

****************** CODE ****************** data a; input var1 var2; cards; 1 1 2 2 3 3 4 4 5 5 6 6 7 7 ; run;

data b; input var1 var2 var3; cards; 7 0 0 6 7 3 1 2 1 2 2 8 ; run;

proc sort data=a; by var1 var2; run; proc sort data=b; by var1 var2 var3; run;

/* The goal is to compare datasets A and B and identify where EITHER case1 OR case2 are met: case1 - If var1 in dataset A = var1 in dataset B AND var2 in dataset A = var2 in dataset B OR case2 - If var1 in dataset A = var1 in dataset B AND var2 in dataset A = var3 in dataset B THEN - take action - */ data c; merge a (in=in_a) b (in=in_b rename=(var2=b_var2));

by var1; format match_var 1.; match_var = 0;

if in_a &amp; in_b then do; if var2 = var3 then match_var = 3; *if the variables have different names, then it works; else if var2 = b_var2 then match_var = 2; *if the variables have the same name, then one must be renamed; end;

if in_a then output; run;

----- End of original message -----


Back to: Top of message | Previous page | Main SAS-L page