|
Andrew Kramer wrote:
> I have a dataset with 100,000 observations. Each observation might get
> predictions for hospital death and hospital length of stay (LOS),
> respectively if they don't meet certain exclusion criteria. Here is an
> example of how the data set is strcutured:
>
> ID DEATHexc1 DEATHexc2 LOSexc1 LOSexc2
> 1 0 0 0 0
> 2 1 0 0 0
> 3 0 1 0 0
> 4 1 1 0 0
> 5 0 0 1 0
> 6 1 0 0 1
>
> Patient #1 has all exclsuion varaibles = 0 so he gets both a
> mortality and an LOS prediction. Patient #2 has one of the two
> mortality exclusions so she only get an LOS prediction. Likewise
> patient #3 and patient #4 only get an LOS prediction. Patient #5 gets
> only a mortality prediction and patient #6 gets no prediction.
>
> Here's what I would like to end up with:
>
> ID DeathPrediction LOSPrediction
> 1 call macro death predict call macro LOS predict
> 2 . call macro LOS predict
> 3 . call macro LOS predict
> 4 . call macro LOS predict
> 5 call macro death predict .
> 6 . .
>
> I've made this problem simpler than it really is as there are more
> than two types of predictions. However if someone can propose a
> solution to the above then I can extrapolate it out to the full set
> of predictions.
Andrew:
Use the SUM (of {varname-prefix}:) construct to total the binary flags of
each group. Implement your dispatching ruleset based on the SUM being 0
(all flags of a group are zero) and >0 (one or more flags of a group are
non-zero). A group is presumed to be an implicit array (varnames end in a
number) per your example.
The example uses _dispatches so as to not clash with d1-d2. Your array
names are not so simplistic (DEATHexc:) so they are unlikely to clash with
any 'worker' variables you use in your actual DATA Step. (I.E. A clash --
sum(of D:) would try to use a worker variable named Dispatch in the sum, as
well as d1 and d2))
Example
--------------
data foo;
do rowid = 1 to 100;
array easy a1-a6 b1-b5 c1-c7 d1-d2 e1-e9 f1-f5;
do over easy;
easy = ranuni(123) < 0.15;
end;
output;
end;
format _numeric_ 1.;
format rowid 12.;
run;
data foo_dispatches;
set foo;
length _dispatch $2000;
* group rules for dispatch;
if sum(of a:) = 0 and sum(of b:) > 0 then
_dispatch = 'predictA ' || _dispatch;
if sum(of a:) = 0 and sum(of c:) = 0 then
_dispatch = 'predictA predictC ' || _dispatch;
if sum(of a:) > 0 and sum(of d:) = 0 then
_dispatch = 'predictD ' || _dispatch;
* etc.... ;
* will need to remove duplicate predicts in dispatch
* before actually invoking them using what ever mechanism you decide on;
run;
--
Richard A. DeVenezia
http://www.devenezia.com/
|