LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 2 Feb 2005 09:25:23 -0500
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: SAS Merge
Comments: To: Amrita Singh <AmritaS2@AOL.COM>
Content-Type: text/plain; charset="iso-8859-1"

Amrita: On a Unix platform we found that compressing source system files using the GNU gzip and piping compressed data through a zcat pipe, SAS view filter, and projection (data step view with input statements, and SQL select statement with WHERE clause) worked much faster than reading uncompressed system files. This strategy works well when the process subsets rows and columns on input.

A similar strategy partitions source data into related subsets to eliminate unrelated columns and repetition of repeated values. A useful method eliminates empty space that empty text variables occupy in 'flatfile' databases.

Using these strategies we have many fewer production bottlenecks. Even though 'fuzzy linkage' of very large volumes of data tends to explosive demands for memory and disk space, only rarely do we have to test the limits of our servers. Sig

-----Original Message----- From: SAS(r) Discussion To: SAS-L@LISTSERV.UGA.EDU Sent: 2/2/2005 12:00 AM Subject: Re: SAS Merge

Hi,

The SAS dataset only has a few variables for counts while the flat file has the chunk of the data. The selects and models are run using SAS for the most part. We also use some products by Group1 Software. We currently have a 20 million file with live data which has distributions similar to the final 130 million record file. We can use that for estimations...thanks for the suggestion.

Amrita

In a message dated 2/1/2005 10:42:10 P.M. Eastern Standard Time, _nospam@HOWLES.COM_ (mailto:nospam@HOWLES.COM) writes:

The last step creates both an external flat file and a SAS data set. Are you going to keep both (*two* 700-GB footprints)? If not, why create both?

Another way of getting at this: are the "selects and models" to be "run during the week" done with SAS, or something else, or a mix? What if any non-SAS products are involved here?

In any case, have you tried generating a 700-GB test file with fake data but somewhat realistic distributions to gauge the performance of your weekday jobs? If not, you may have some unpleasant surprises later.


Back to: Top of message | Previous page | Main SAS-L page