Date: Fri, 28 Apr 2000 17:56:40 GMT
Reply-To: sashole@mediaone.net
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject: Re: selecting values into a set of macro variables where the
number of distinct values is unknown
Content-Type: text/plain; format=flowed
Richard,
As Bernard has already pointed out, SQL only enables one to create a
concatenated list of distinct values separated by a specified character
(Bernard likes the comma, but another character might be used if the column
values contain commas themselves; I would say, the zero hex '00'x would be a
safer bet). Although SQL seems to be the perfect tool for the task - for it
makes both the unduplication and concatenation seamless - there are a couple
of drawbacks. First, the method will apparently fail if the length of the
concatenated string exceeds the limit imposed on the length of a macro
variable (say 32767). Second, a macro is needed to parse the SQL output. The
DATA step code below solves the problem in a single step assuming that both
the table the column are called A, the column is character and LE 8 bytes
long. If it is GT 8, increase the declared array item length accordingly.
%let mvprefix = m;
data _null_;
array h (0:35023) $8 _temporary_;
retain h null '00'x;
set a end=eof;
do x=mod(input(a,pib6.),35023) by 1 until (h(x)=null or h(x)=a);
if x > 35023 then x = 0;
end;
if h(x) = null then h(x) = a;
if eof then do x=0 to 35023;
if h(x) = null then continue;
n ++ 1;
call symput ("&mvprefix"||left(put(n,6.)),h(x));
end;
run;
This will create a set of macro variables m1, m2,... populated with the
distinct values of A. As you can see, I have already cannimalized on the
discovery made yesterday by Jim Groeneveld to initialize the temporary array
to low values without looping through it in the same RETAIN statement where
the comparison null variable is valued. If nothing else, it saves full 3
lines of code! The size of the array H (which, as everyone has undoubtedly
already guessed, is nothing else but the most primitive, i.e. linearly
probed, hash table) is determined _not_ by the size of the file A, but by
the number of distinct values of the variable A. Since I assume that in
practice, nobody would venture to create more than 30,000 such macro
variables for whatever purpose, I have chosen the prime upper bound to be
slightly larger. If my assumption is wrong, then memory permitting, you can
make the array as large as you wish; the only requirement being that the
upper bound is prime. Determining the lowest prime greater than a specified
number by hand may be tedious ;-), and the corresponding prime table may not
be readily available, so it is best to let SAS do the job, too:
*** finding first prime GT smth ***;
%let smth = 35000;
data _null_;
prime = &smth;
do until (j = u + 1);
prime ++ 1;
u = ceil(sqrt(prime));
do j=2 to u until (mod(prime,j) eq 0); end;
end;
put prime=;
run;
Of course, PRIME can be stored in a macro variable to size the array
dynamically. The only remaining question is, what if the variable A is
numeric? In this case, the code should be slightly modified, yet in
actuality it becomes simpler and faster:
data _null_;
array h (0:35023) _temporary_;
set a end=eof;
do x=mod(a,35023) by 1 until (h(x)=. or h(x)=a);
if x > 35023 then x = 0;
end;
if h(x) = . then h(x) = a;
if eof then do x=0 to 35023;
if h(x) = . then continue;
n ++ 1;
call symput ("&mvprefix"||left(put(n,6.)),put(h(x),best.));
end;
run;
Kind regards,
=====================
Paul M. Dorfman
Jacksonville, Fl
=====================
>From: "Walker, Richard M" <Richard.M.Walker@BRITISHAIRWAYS.COM>
>Reply-To: "Walker, Richard M" <Richard.M.Walker@BRITISHAIRWAYS.COM>
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: selecting values into a set of macro variables where the number
> of distinct values is unknown
>Date: Fri, 28 Apr 2000 13:11:15 Z
>
>---------------------- Forwarded by Richard M Walker/HEATHROW/BRITISH
>AIRWAYS/GB on 28/04/2000 14:08 ---------------------------
>
> Richard M Walker
> 28/04/2000 12:25:42
>
>
>Tel: 0208-56-28248
>Mobile: 0385-934634
>OceanWave SAS Developer
>TBC 10 South Wing Bay 4B - S120
>
>To: sas-l
>cc:
>bcc:
>Subject: selecting values into a set of macro variables where the number of
>distinct values is unknown
>
>Help,
>
>I recall seeing an e-mail sometime ago that indicated that using proc sql
>it is possible to select all the distinct values of a column into a set of
>macro variables.
>
>Can anyone show me an example of this technique?
>
>Regards
>Richard
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
|