Date: Wed, 25 Aug 1999 11:32:35 -0400
Reply-To: "Fehd, Ronald J." <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Fehd, Ronald J." <rjf2@CDC.GOV>
Subject: Re: array element count
Content-Type: text/plain; charset="iso-8859-1"
From: Joshua Muscat [mailto:jmuscat@EARTHLINK.NET]
/Can some someone describe how to return the most frequent value of the
/elements in an array?
/var1 var2 var3 var4
/1 2 2 3
/How do you return the value 2?
Why?
data ARRAY(keep = ID Value);
array Var{4};
do until(Endofile);
set WHATEVER end = EndoFile;
do I = 1 to dim(Var); Value = Var{I};
output; end;
*do until;end;
PROC FREQ data = ARRAY order=freq;
tables Id * Value /list missing noprint out = FREQ;
DATA LIST;
set FREQ;
by ID;
if first.Id;
proc PRINT;
As always we need a clear statement of the task.
The task as I read it is to find the most frequent occuring value in an
array of variables.
well, of course, we are going to do this for every observation, right?
1: change the structure: stack the values into one variable
2. use the tools: proc FREQ; the option order=freq puts the value with the
highest count first, thus eliminating the need for a sort between the proc
and the subset
3. subset the data to choose only the first -- highest count -- observation.
4. what to do with this data set?
print it?
merge it with original?
there are ways to do this within an observation.
I would suggest examining the ordinal function which could be used to sort
the values.
problems with this approach would be knowing how many values there are to
count.
one could have a separate array of counters, but that approach will only
work easily with values that are integers.
for instance:
array Var(4);
array Counts(0:9);*ASSUME Var(*) in zero thru nine;
do I = 1 to dim(Var);
Counts(Var(I)) = Counts(Var(I)) + 1; end;
*neat trick: whatever value is in var, increment the count for that value
in the array Counts;
then we have to look thru Counts to find the largest value and the index of
the largest value is the most frequently occuring value, got that?
obviously the problems are
1. determining the dimension of the Counts array.
2. discontinuous series wastes space.
again: what exactly was the task? and why?
Ron Fehd the data structure and array maven CDC Atlanta GA