Date: Sun, 17 Jun 2007 22:43:29 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: SAS data-set index size
In-Reply-To: <1181932401.416586.107540@d30g2000prg.googlegroups.com>
Content-Type: text/plain; format=flowed
auto208611@HUSHMAIL.COM wrote:
>
>Is a telling story if the size of the SAS index file (*.sas7bdnx) is
>75% the size
>of the SAS data-set itself?
>
>For instance, we have a data-set that is 533MB's and the index file
>is
>400MB's.
>
>Is this an indication of a poor data-set structure?
No, it is not an indication of a poor data structure. (You may *have*
a poor data structure, but the relative size of the index file is not
indicative.)
Since Mister Index himself has already chipped in, I'll just add a couple
other points.
Think conceptually of the index file as being like another data set.
If you have 1 index, the index file needs a record for every record in
the data set, with all the variables that make up your index, plus an
extra 8-byte variable that serves as a pointer to the correct record in
the data set. If you have a very tall-and-thin data set to start with
and you build a composite index, you may have nearly as many
variables in your index as in your data set. So the index file can be
nearly as big as the file itself, without being a bad thing.
Also, if your index is based on one or more very long variables in your
data file, then the index has to be large also. I used to design GRTS
samples this way, with an index on a string that was 8 bytes * the
number of levels in the hierarchical structure. So I sometimes had
an index that was more than half the size of the data set, because
the index variable was more than half of the width of the data record!
As Mikeeeeeee pointed out, if you have multiple indices, *all* the
indices sit in the index file. So take the above blather and add more
variables and make the width of the record even worse. For a well-
structured, highly normalized data set with several different indexes,
I can see the index file getting quite close to the original data set
in size.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Picture this – share your photos and you could win big!
http://www.GETREALPhotoContest.com?ocid=TXT_TAGHM&loc=us