Date: Thu, 9 Jul 2009 11:17:20 -0400
Reply-To: "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Subject: Re: Grouping similar codes occuring across different states
In-Reply-To: <5aa1e850-8b7d-4357-8cdb-5c7b2a0d85af@k1g2000yqf.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"
Chumba says:
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> chumba
> Sent: Thursday, July 09, 2009 9:37 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Grouping similar codes occuring across different states
>
> Hi,
>
> I have a variable state and a variable code in my data,
>
> each state can have different codes and vice versa such that codes
> like AA.AAA,TR,AS,UC,FA, etc. can occur
> for multiple states.
>
> I need to re configure my data in such a way that all the similar
> occuring codes are grouped according to their
> corresponding states in descending order of their frequencies.
>
> Could anyone please help me out with this,
>
> Thanks in advance.
Chumba:
Assuming data set HAVE has only one record for each STATE/CODE combination, the
code below should do what you want.
It does the following:
1. Reads in every record and puts it in a hash table HAVE keyed
by N (the incoming record number). While reading data, it tracks
the frequency of each code and maintains it in a hash table CF,
keyed by CODE.
2. Declare a new hash table FINAL keyed by descending
CODE_FREQ CODE STATE.
3. Read each item from HAVE, get corresponding CODE_FREQ from CF,
and add it to the properly ordered hash table FINAL.
In this process I multiply CODE_FREQ by -1. This allows
declaring FINAL in ascending order, yet starting with the
highest observed frequency and proceeding to the lowest frequency,
and also yielding STATEs in alphabetic order for each CODE.
If I didn't multiple CODE_FREQ by -1, I'd have to put FINAL
in descending order, forcing a reverse alphabetic list of
STATE for each CODE (and reverse alpha list of CODE for each
CODE_FREQ).
Code follows:
data _null_;
set have (obs=1) ; ** Gets lengths of all variables for hash object HAVE **;
declare hash have ();
have.definekey('n');
have.definedata('code','state','otherdata');
have.definedone();
length code_freq 8;
declare hash cf (hashexp:4);
cf.definekey('code');
cf.definedata('code_freq');
cf.definedone();
do n=1 by 1 until (end_of_have);
set have end=end_of_have;
rc=have.add();
if cf.find()=0 then code_freq=code_freq+1;
else code_freq=1;
rc=cf.replace();
end;
nrecs=n;
declare hash final(hashexp:4,ordered:'ascending');
final.definekey('neg_code_freq','code','state');
final.definedata('code_freq','code','state','otherdata');
final.definedone();
do n=1 to nrecs;
rc=have.find();
rc2=cf.find();
neg_code_freq = -1*code_freq ;
rc3=final.add();
end;
rc=final.output(dataset:'want');
run;
Regards,
Mark