Date: Tue, 12 Dec 2006 16:49:56 -0800
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: which file format loads quickest?
In-Reply-To: <1165968504.103608.257800@j44g2000cwa.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"
Hi Jared -
20000 variables is a very wide dataset. I'd look into normalizing the
data if possible, especially repetitious character fields of longer
lengths.
I'd try to squeeze the data a bit, because I/O is a killer.
You can format repetitious character data with short alphanumeric keys
and restore the long strings with formats.
If you have categorical data stored as numbers such as integers 0-9,
which I'd guess might be true in your 2000 vars, storing them as single
byte character reduces space by 7/8ths in the SAS data, although I can't
say the impact on a transport file.
hth
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
jared_hellman@yahoo.com wrote:
> Hello all,
>
> I am working in a Unix environment and I have a dataset which I would
> like to save with dimensions:
> ~20000 vars
> ~25000 obs
> logical record length of ~800000
>
> Saving this file as a V8 transport file means I can expect to wait
> about 11-12 minutes for it to load using proc cimport. Is there
> another format I can save this file in that would allow me to access
it
> with a quicker load time?
>
> Thanks!
> -Jared
|