Date: Wed, 27 Jun 2007 17:32:45 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Creating a unique dataset
On Mon, 25 Jun 2007 17:46:13 +0530, N, Raghupathi <raghupathi.n@GENPACT.COM>
wrote:
>
>
>Hi all,
>
>
>
>I have a data set which consists of Patient_id, quarter and
>corresponding events.
>
>
>
>Patient_id
>
>Quarter
>
>Event_1
>
>Event_2
>
>Event_3
>
>Event_4
>
>Event_5
>
>21
>
>1
>
>1
[snip]
>0
>
>
>
>
>
>My problem is I have to recreate a dataset which need patient per
>quarter per events ie. All the events come under one row.
>
>
>
>Patient_id
>
>Quarter
>
>Event_1
>
>Event_2
>
>Event_3
>
>Event_4
>
>Event_5
>
>21
>
>1
>
>1
>
>1.2
>
>1.6 or 2.8
>
>2.4
>
>1.5 or 3
>
[snip]
>
>3
>
>2.5 or 1.1
>
>1
>
>
>
>Here or implies any one of the event not (both or sum or division or
>aggregate or subtraction)
>
>
>
>Thanks in advance,
>
>
>
>Raghupathi
First, take a look at your post http://tinyurl.com/36lb3u and notice that
the example is a mess, with all headings and values folded into a single
column. I suspect that is why the question has received no attention.
A better presentation of the given data set:
data have;
input
ID Q E_1 E_2 E_3 E_4 E_5
; cards;
21 1 1 0 2.8 2.4 3
21 1 0 1.2 1.6 0 1.5
21 2 0 2.5 0 0 1.6
32 1 1 0 3 2.5 1
32 1 0 2.4 0 1.1 0
;
What is needed:
ID Q E_1 E_2 E_3 E_4 E_5
21 1 1 1.2 1.6 or 2.8 2.4 1.5 or 3
21 2 0 2.5 0 0 1.6
32 1 1 2.4 3 2.5 or 1.1 1
The task can be broken into steps. First, generate line numbers so that the
observations can be referenced individually:
data numbered;
num + 1;
set have;
run;
Then, normalize:
proc transpose data=numbered out=long;
by num id q;
var e_ : ;
run;
Now sort:
proc sort data=long out=ordered(drop = num);
by id q _name_ num;
run;
The sort order presumes that the values connected by the word "or" should
appesr in the order of their original appearance. Another possibility is to
have them appear in order of magnitude by using COL! as the most minor sort
key. The example has instances of each, so I'm not sure.
Now combine values which must share a cell:
data rollup(drop = col1);
do until (last._name_);
set ordered;
by id q _name_;
length rollup $ 12;
if col1 then rollup = catx ( ' or ' , rollup , col1 );
end;
rollup = coalescec ( rollup , '0' );
run;
Finally, de-normalize back to a wide structure>
proc transpose data=rollup out=need(drop = _name_);
by id q;
var rollup;
run;
Output:
ID Q E_1 E_2 E_3 E_4 E_5
21 1 1 1.2 2.8 or 1.6 2.4 3 or 1.5
21 2 0 2.5 0 0 1.6
32 1 1 2.4 3 2.5 or 1.1 1