Date: Tue, 13 Aug 2002 12:16:42 -0400
Reply-To: "Miller, Jeremy T." <zyp9@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Miller, Jeremy T." <zyp9@CDC.GOV>
Subject: Aggregating data across several variables
Content-Type: text/plain
SAS-Lers,
I have a major programming problem concerning CMS data--specifically, the
MDS. I know a few of you have dealt with this data in the past.
For some background: these data are records on nursing home patients all
across the country. Each record is an assessment taken on an individual.
Each assessment is of a different type (initial, annual, quarterly,
reentry). Depending on the asssessment type, various fields are missing,
but some fields have been updated since the last assessment.
My task is to create a record that has the most up-to-date information with
as few missing values as possible.
My initial plan is to rank each assessment.
Sort the data by resident and rank.
If a value is present, that value will go into my aggregate record, if
missing, the value of the next non-missing record will "float up" to fill
the cell--this will be the aggregate value for that field, and so on.
My problems are: 1. The MDS has over 700 variables, although all will not
be used, a great many will. Should I use an array here?
2. I'm not sure how to begin this "bubble-sort" type of routine.
Here is an example:
data a;
input id $ asmt $ a1 a2 a3 a4 a5 ;
if asmt = '01' then rank = 4;
else if asmt = '02' then rank = 2;
else if asmt = '05' then rank = 3;
else rank =1;
cards;
VA-001 01 1 2 3 4 5
VA-001 02 2 . 1 3 .
VA-001 05 1 . . . 1
VA-001 09 . . . 2 .
GA-453 01 3 6 4 4 5
GA-453 02 3 6 . 5 .
GA-453 05 . . 3 . 5
;
run;
proc sort data=a;
by id rank;
run;
In the end, my single record for each individual above should be:
ID a1 a2 a3 a4 a5
GA-453 3 6 3 5 5
VA-001 2 2 1 2 1
Any tips would be greatly appreciated.
Jeremy Miller, DynCorp
jtmiller@cdc.gov
(404) 639-1883