Date: Wed, 5 Apr 2000 02:26:08 -0400
Reply-To: Paul Dorfman <sashole@MEDIAONE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <sashole@MEDIAONE.NET>
Subject: Re: Formatting CLASS values in PROC TABULATE
Content-Type: text/plain; charset=ISO-8859-1
On Tue, 4 Apr 2000 14:04:39 -0400, Ray Pass <raypass@WORLDNET.ATT.NET>
wrote:
Ray,
'Cause like formats've got the mind of their own. Their internal algorithm
(supposedly, AVL tree) is such that the input and output order may not
align. However, I see nothing contemptuous with replacing the infamous
formats with simple
data tv/view=tv;
set test;
if a = '1-1' then a = 'aaa';
else a = 'bbb';
if b = '1-1' then b = 'aaa';
else if b = '1-2' then b = 'bbb';
else if b = '1-3' then b = 'xxx';
else if b = '2-1' then b = 'ccc';
else if b = '2-2' then b = 'ddd';
else if b = '2-3' then b = 'xxx';
run;
and referring to that view TV in TABULATE - after which I bet the thingie
will work jest fahn. And with so many entries in the lookup table it will
work no slower than a format, guaranteed. If you had a couple of hundred of
thousand entries in the table, I'd venture to offer a direct-addressing
based solution with order preservation working three times as fast as a
format to boot.
Kind regards,
===================
Anyone
Jacksonville, Fl
===================
>SAS-L;
>
>I'm probably missing something (again), but can someone explain the
>following to me. I get the same results in V6 or V8.
>
>The following code (MUCH simplified):
>
>-----------------------------------------------
>data test;
> input @1 a $3.
> @5 b $3.
> @9 score 1.;
>cards;
>1-1 1-1 1
>1-1 1-1 1
>1-1 1-2 2
>1-1 1-2 2
>1-1 1-3 3
>1-1 1-3 3
>2-2 2-1 1
>2-2 2-1 1
>2-2 2-2 2
>2-2 2-2 2
>2-2 2-3 3
>2-2 2-3 3
>;
>
>proc format;
> value $afmt '1-1' = 'aaa'
> '2-2' = 'bbb';
>
> value $bfmt '1-1' = 'aaa'
> '1-2' = 'bbb'
> '1-3' = 'xxx'
> '2-1' = 'ccc'
> '2-2' = 'ddd'
> '2-3' = 'xxx';
>run;
>
>proc tabulate data=test format = 5.0 order=internal;
> class a b;
> var score;
> table a * b,
> score*sum;
> format a $afmt. b $bfmt.;
>run;
>----------------------------------------------------
>
>yields the following table:
>
> „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ†
> ‚ ‚score‚
> ‚ ‡ƒƒƒƒƒ‰
> ‚ ‚ Sum ‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚a ‚b ‚ ‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
> ‚aaa ‚aaa ‚ 2‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚bbb ‚ 4‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚xxx ‚ 6‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚bbb ‚xxx ‚ 6‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚ccc ‚ 2‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚ddd ‚ 4‚
> ƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒŒ
>
>Without the formats for a and b, I get the following:
>
> „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ†
> ‚ ‚score‚
> ‚ ‡ƒƒƒƒƒ‰
> ‚ ‚ Sum ‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚a ‚b ‚ ‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
> ‚1-1 ‚1-1 ‚ 2‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚1-2 ‚ 4‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚1-3 ‚ 6‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚2-2 ‚2-1 ‚ 2‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚2-2 ‚ 4‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚2-3 ‚ 6‚
> ƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒŒ
>
>This is what I expected. Why is the ordering of the b values screwed up in
>the first example (using the formats)? I was expecting to see:
>
> „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒ†
> ‚ ‚score‚
> ‚ ‡ƒƒƒƒƒ‰
> ‚ ‚ Sum ‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚a ‚b ‚ ‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚
> ‚aaa ‚aaa ‚ 2‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚bbb ‚ 4‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚xxx ‚ 6‚
> ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚bbb ‚ccc ‚ 2‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚ddd ‚ 4‚
> ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒ‰
> ‚ ‚xxx ‚ 6‚
> ƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒŒ
>
>Notice that I used ORDER=INTERNAL (in V8, you can use the ORDER= option on
>the CLASS statement as well as on the TABULATE statement; I tried both -
>same result.) I can't use ORDER=FORMATTED because in the real-life
>application, I want the data formatted by their internals values, not their
>formatted ones. I know it's got something to do with the fact that 'xxx'
>is a formatted value for two different internal values of the b variable,
>but they appear in two separate values of the a variable.
>
>Can someone please enlighten? Lauren? Little birdies? Anyone?
>
>TIA
>
>Ray
>
>*------------------------------------------------*
>| Ray Pass voice: (914) 693-5553 |
>| Ray Pass Consulting eFax: (914) 206-3780 |
>| 5 Sinclair Place |
>| Hartsdale, NY 10530 e-mail: raypass@att.net |
>*------------------------------------------------*
>
|