|Date: ||Mon, 8 Dec 2008 11:49:34 -0800|
|Reply-To: ||Steve James <spj1@CDC.GOV>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Steve James <spj1@CDC.GOV>|
|Subject: ||Re: error:out of mempry|
|Content-Type: ||text/plain; charset=ISO-8859-1|
In addition to the Hash object, you might try creating a format of all
the valid record identifiers and using that to identify the records
you want. I came across the following code last week and it was fresh
in my mind.
create table eventfmt as
put(event_id,6.) as start format=$6.
,"event" as fmtname
,"yes" as label format=$6.
insert into eventfmt set start='other' ,
fmtname="event" ,label='not' ;
proc format cntlin=eventfmt;
/* grab all records for event_id's */
data merged ;
set duplicates(where=(put(event_id,event.) = 'yes'));
On Dec 8, 3:21 am, rams <rameshg...@gmail.com> wrote:
> Hi all,
> I have a dataset comprising of around 200 Million records. These
> records contain duplicates. I have another dataset with around 500K
> unique records. Now, I have to create a new dataset by merging these
> two datasets and get the match of these 500K unique records from the
> 200Million records in other dataset.
> Solutions I tried my self are :
> 1. Proc Sql + Left join - failed after running for around 4 hours with
> Out Of Memory issue.
> PROC SQL; CREATE TABLE MERGED AS
> SELECT *
> FROM firstdataset AS A LEFT JOIN VISITS AS B
> ON A.HRN=B.CHART
> 2. Used proc sort and sas data step merge statement. Again failed
> aftert 4 hours apporx with OOM error.
> Ramesh G