Date: Mon, 16 Feb 2009 09:51:21 -0800
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Brian Newquist <brian_newquist@YAHOO.COM>
Subject: Question Concerning Handling Large Many-to-Many Join Output Table
Content-Type: text/plain; charset=iso-8859-1
I am trying to join two datasets. My main issue has to do with the size of a table I am trying to produce from a many-to-many join. I am using PROC SQL to do an inner join of a dataset having 7 columns and containing 1,528,062 records with another having the same number of columns and containing 101 records.
While realizing that my plan for the merge will create almost 155 million records, I tried running the join but recieved a message saying that SAS had to stop processing due to limited resources. Although I had only about 200 MB of available storage space at the time of this run, I wonder whether this message occurred due to a reason having to do with an issue other than storage space since I didn't use an "out=" option or anything similar to one in the PROC SQL statement. Do you have any suggestions as to what may have caused the processing to stop? My plan right now is to increase the available storage space to 3GB through resources my graduate school department may have for providing this on my desktop and try to run this again.
Although I can try this, I am worried about my ability to complete the plans I have for this dataset even if I can create it. In particular, my current plans are to eventually add 2048 columns to this dataset for calculations needed for my thesis. This worries me since I've heard that dataset columns tend to take up more space than rows. I may be able to justify enough constraints in my analysis and perhaps only add about 35-40 columns instead. Does anyone have any advice about how might the large number of columns (or row cases) affect the processing power in terms of time, storage space, RAM, etc.? I will have to perform a large number of calculations which will involve some sorting of this giant dataset when creating the additional 35-40 columns (or perhaps the 2048 columns called for in my original plans). I will also have to merge this giant dataset with another dataset containing about 3 variables but this last merge will basically
just be one that results in a final dataset having the same number of cases as the giant one that I am now trying to produce.
I realize that this is a long posting that probably alludes to a variety of issues. Any insight, advice, and suggestions that anyone can provide regarding these issues will be greatly appreciated.