Date: Thu, 11 Sep 2008 18:42:14 -0400
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: KEEP DROP array variables
I haven't kept up with this thread and, upon trying to review it, quickly
saw that the underlying subject has changed at least once.
I'm responding to the issue of whether a short-wide file is better than a
tall-narrow one.
It was surprising to see the number of responses that addressed the
increased length of tall-narrow files as, unless I've missed something,
tall-narrow file have to be larger by definition.
But, as for utility, I have to agree with Mary. That is, why store one's
data in a different format than you usually need it? Enquiring minds want
to know.
Art
-------
On Thu, 11 Sep 2008 17:07:44 -0500, Mary <mlhoward@AVALON.NET> wrote:
>Yes, I halved the number of rows by using the two matching variables on
the
>same row, but note that the varID variable has to be as large as the
largest
>variable name, such as 30 characters, not 8 characters as in your
>calculation. I just looked at that variable by itself by creating a data
>set with just the varID variable, and even with half the rows it normally
>would have been it was 100MG, which by itself is 3 times the size of the
>original file!
>
>-Mary
>----- Original Message -----
>From: Nordlund, Dan (DSHS/RDA)
>To: SAS-L@LISTSERV.UGA.EDU
>Sent: Thursday, September 11, 2008 4:48 PM
>Subject: Re: KEEP DROP array variables
>
>
>>> > ----- Original Message -----
>> > From: Mary
>> > To: ./ ADD NAME=Data _null_, ; SAS-L@LISTSERV.UGA.EDU
>> > Sent: Thursday, September 11, 2008 9:12 AM
>> > Subject: Re: Re: KEEP DROP array variables
>> >
><<<snip>>>
>
>I am not going to comment on the appropriateness of wide vs. narrow. But
>the fact that Mary found a big increase in size when going to narrow does
>not surprize me. Let's use numbers "like" Mary is giving: 1000 rows,
6000
>data variables with let's say 1 additional ID variable (all numeric).
This
>is 8 * 1000 * 6001 = 48008000 ~ 48MB.
>
>In the narrow file we will have an obsID, a varID variable, and a data
Value
>variable. There will be 6000*1000 rows of 3 variables. This is
>8*3*6000*1000 = 1.44 X 10^8 = 144MB. It may be that Mary had extraneous
>variables that didn't need to be there, but the narrow file will be
>substantially larger than the same data stored in a wide format.
>
>Dan
>
>Daniel J. Nordlund
>Washington State Department of Social and Health Services
>Planning, Performance, and Accountability
>Research and Data Analysis Division
>Olympia, WA 98504-5204
|