LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 17 Jun 2007 22:43:29 -0700
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: SAS data-set index size
In-Reply-To:  <1181932401.416586.107540@d30g2000prg.googlegroups.com>
Content-Type: text/plain; format=flowed

auto208611@HUSHMAIL.COM wrote: > >Is a telling story if the size of the SAS index file (*.sas7bdnx) is >75% the size >of the SAS data-set itself? > >For instance, we have a data-set that is 533MB's and the index file >is >400MB's. > >Is this an indication of a poor data-set structure?

No, it is not an indication of a poor data structure. (You may *have* a poor data structure, but the relative size of the index file is not indicative.)

Since Mister Index himself has already chipped in, I'll just add a couple other points.

Think conceptually of the index file as being like another data set. If you have 1 index, the index file needs a record for every record in the data set, with all the variables that make up your index, plus an extra 8-byte variable that serves as a pointer to the correct record in the data set. If you have a very tall-and-thin data set to start with and you build a composite index, you may have nearly as many variables in your index as in your data set. So the index file can be nearly as big as the file itself, without being a bad thing.

Also, if your index is based on one or more very long variables in your data file, then the index has to be large also. I used to design GRTS samples this way, with an index on a string that was 8 bytes * the number of levels in the hierarchical structure. So I sometimes had an index that was more than half the size of the data set, because the index variable was more than half of the width of the data record!

As Mikeeeeeee pointed out, if you have multiple indices, *all* the indices sit in the index file. So take the above blather and add more variables and make the width of the record even worse. For a well- structured, highly normalized data set with several different indexes, I can see the index file getting quite close to the original data set in size.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Picture this – share your photos and you could win big! http://www.GETREALPhotoContest.com?ocid=TXT_TAGHM&loc=us


Back to: Top of message | Previous page | Main SAS-L page