Date: Tue, 10 Oct 2006 13:48:55 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: SAS with data sets of a billion+ records
Content-Type: text/plain; charset="us-ascii"
Refreshing to see someone planning ahead! Usually it's something like -
I have this job that takes 38 hours I've been running for the past 10
There have been more than a few SAS-L posts on this very topic.
Take a look at "Table Look-ups" and "the Hash iterator" - papers and
posts by Paul Dorfman in particular. If it applies, it's probably the
fastest way to join large data - usually by orders of magnitude.
Art Carpenter's Macro Guide and the SAS Macro Facility Tips book are
great starting points for macro reading.
Michael Raithel has a newish book on SAS Indexes that is worth reading.
In my very humble opinion splitting the files where sensible and making
sure all queries are optimizable are good places to start.
Good luck & hth
DDS Data Extraction
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Sent: Tuesday, October 10, 2006 1:08 PM
Subject: SAS with data sets of a billion+ records
I am working with some tables that have between 600 million and 1.2
billion records. In the end I will only need about 400 million of these
records but to identify them I will have join several tables together
(sorting or indexing each time) to get to this point. The files don't
have a lot of fields as they come from a very normalized transactional
system so even when everything is joined there will be <50 fields.
Before I get too far down this road I would like to know if anyone
recommends any particular SAS reading (other than the SAS manual and SAS
help files) that speaks to working with files of this size and length?
Also, any suggestions for a good macro book? I suspect I will be
writing some Macros to split these files by id numbers to handle them in
smaller chunks and then reconstruct them.
I am on a Windows 32 bit machine with two dual core processors that only
runs SAS. It is tied into an IBM 14 disk drawer that gives me a TByte
of space to work with and lots of I/O.