LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2006, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 10 Oct 2006 13:48:55 -0700
Reply-To:     "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject:      Re: SAS with data sets of a billion+ records
Comments: To: Richard Reeves <reeves@STUDENTCLEARINGHOUSE.ORG>
In-Reply-To:  <0F491855115A4B49935172B099586D4B05140E9C@clifford.nslc.org>
Content-Type: text/plain; charset="us-ascii"

Richard -

Refreshing to see someone planning ahead! Usually it's something like - I have this job that takes 38 hours I've been running for the past 10 years....

There have been more than a few SAS-L posts on this very topic.

Take a look at "Table Look-ups" and "the Hash iterator" - papers and posts by Paul Dorfman in particular. If it applies, it's probably the fastest way to join large data - usually by orders of magnitude.

Art Carpenter's Macro Guide and the SAS Macro Facility Tips book are great starting points for macro reading.

Michael Raithel has a newish book on SAS Indexes that is worth reading.

In my very humble opinion splitting the files where sensible and making sure all queries are optimizable are good places to start.

Good luck & hth

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Richard Reeves Sent: Tuesday, October 10, 2006 1:08 PM To: SAS-L@LISTSERV.UGA.EDU Subject: SAS with data sets of a billion+ records

Hi, I am working with some tables that have between 600 million and 1.2 billion records. In the end I will only need about 400 million of these records but to identify them I will have join several tables together (sorting or indexing each time) to get to this point. The files don't have a lot of fields as they come from a very normalized transactional system so even when everything is joined there will be <50 fields.

Before I get too far down this road I would like to know if anyone recommends any particular SAS reading (other than the SAS manual and SAS help files) that speaks to working with files of this size and length? Also, any suggestions for a good macro book? I suspect I will be writing some Macros to split these files by id numbers to handle them in smaller chunks and then reconstruct them.

Thanks, rich I am on a Windows 32 bit machine with two dual core processors that only runs SAS. It is tied into an IBM 14 disk drawer that gives me a TByte of space to work with and lots of I/O.


Back to: Top of message | Previous page | Main SAS-L page