|Date: ||Tue, 16 Sep 2008 06:38:17 -0700|
|Reply-To: ||Akshaya <akshaya.nathilvar@GMAIL.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Akshaya <akshaya.nathilvar@GMAIL.COM>|
|Subject: ||Which is more efficient: Inner join or a non-correlated subquery
or Hash Object or Data step Merge?|
|Content-Type: ||text/plain; charset=windows-1252|
I have a SAS dataset (Master) with billions of records. I have another
cohort (Subset) dataset with thousands of records. I have to select
the records from Master dataset based on ID’s from Subset dataset.
Master dataset have multiple columns, Subset dataset has only one
Column (ID). I’m wondering, what would be the “efficient” way (less
CPU time probably) to do this? Both datasets are sorted by ID
Any help is much appreciated.