Date: Wed, 22 Aug 2007 16:41:14 -0400
Reply-To: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject: Re: SAS data step compute COUNTs and Percentages
In-Reply-To: <20070822190444.4DA90478106@ws1-5.us4.outblaze.com>
Content-Type: text/plain; charset=ISO-8859-1
Tom,
If ID is a character of length 1, then ID = 10 is not possible.
Muthia Kachirayan
On 8/22/07, Tom White <tw2@mail.com> wrote:
>
> Hello SAS-L,
>
> I have a data set containing about 7M records (and about 30 more fields
> besides the two fields, ID & CODE) I show here.
>
> ID is one of the fields in this data set. It stands for a patient ID.
> It is a character field of length 1.
>
> CODE is another 5-digit character field. It stands for a medical
> procedure.
> This field takes on thousands of values. There is, however, a subset of
> these
> values I am interested in. This subset contains the CODE values
>
> NULL 59510 J2505 59400 J1745 J3490
> 27447
> A0431 FACIL J9355 J9310 E0784
>
> Here is sample dataset containing 34 records.
>
> ID CODE
> 1 J3490
> 1 NULL
> 1 36415
> 1 88142
> 1 59510
> 2 99213
> 2 80053
> 2 J2505
> 3 FACIL
> 3 J3490
> 4 99232
> 5 80053
> 5 E0784
> 5 E0784
> 5 J9310
> 6 A0431
> 7 59400
> 7 59400
> 8 NULL
> 8 NULL
> 8 NULL
> 8 NULL
> 9 27447
> 9 J9355
> 10 E0784
> 10 NULL
> 10 E0784
> 10 NULL
> 10 J2505
> 10 59510
> 10 NULL
> 10 NULL
> 10 E0784
> 10 59510
>
>
> I would like to create additional fields to augment the above data set for
> modeling.
>
>
>
> *****************************************************************************************************
> I would like to say in words what I am trying to compute.
>
> Maybe you can find an easier way to get the results I show below.
>
> For each patient (ID) in my sample dataset, create new data columnns to
> compute
> the total number of medical services (CODE) they had within any of the top
> medical services (CODEs)
> shown below:
>
> NULL 59510 J2505 59400 J1745 J3490
> 27447
> A0431 FACIL J9355 J9310 E0784
>
> The above CODEs are the medical procedures, out of many thousands of
> medical procedures,
> I have determined to be of interest to me.
>
> Therefore I would like to know, given the thousands of procedures, and
> given my 12 proceduers shwon above,
> the total number of procedures (CODEs) a patientb (ID) had within any of
> the 12 proceduers shown above.
>
> I would like to create counts and percentages.
>
> *****************************************************************************************************
>
>
>
> (1) I would like to know, for each ID, how many CODEs are found in
> the subset of codes mentioned above (I repeat them here):
> NULL 59510 J2505 59400 J1745 J3490
> 27447
> A0431 FACIL J9355 J9310 E0784
>
> For example, create a new field, call it, say, SUM_COUNT, to count
> the total number of codes per ID:
>
> ID CODE SUM_COUNT
> 1 J3490 1
> 1 NULL 1
> 1 36415 0
> 1 88142 0
> 1 59510 1
> 2 99213 0
> 2 80053 0
> 2 J2505 1
> 3 FACIL 1
> 3 J3490 1
> 4 99232 0
> 5 80053 0
> 5 E0784 2
> 5 E0784 2
> 5 J9310 1
> 6 A0431 1
> 7 59400 2
> 7 59400 2
> 8 NULL 4
> 8 NULL 4
> 8 NULL 4
> 8 NULL 4
> 9 27447 1
> 9 J9355 1
> 10 E0784 3
> 10 NULL 4
> 10 E0784 3
> 10 NULL 4
> 10 J2505 1
> 10 59510 2
> 10 NULL 4
> 10 NULL 4
> 10 E0784 3
> 10 59510 2
>
> (2) Now that I have counts, I would like to compute WITHIN ID percentages.
>
> For example, within ID=10, we have 10 records. Of those 10 records, we
> have:
>
> NULL=4 records or 4/10=40%
> E0784=3 records or 3/10=30%
> J2505=1 records or 1/10=10%
> 59510=2 records or 2/10=20%
>
> My data set now will look like:
>
> ID CODE SUM_COUNT WITHIN_PCT
> 1 J3490 1 0.20
> 1 NULL 1 0.20
> 1 36415 0 0.00
> 1 88142 0 0.00
> 1 59510 1 0.20
>
> 2 99213 0 0.00
> 2 80053 0 0.00
> 2 J2505 1 0.33
>
> 3 FACIL 1 0.50
> 3 J3490 1 0.50
>
> 4 99232 0 0.00
>
> 5 80053 0 0.00
> 5 E0784 2 0.50
> 5 E0784 2 0.50
> 5 J9310 1 0.25
>
> 6 A0431 1 1.00
>
> 7 59400 2 1.00
> 7 59400 2 1.00
>
> 8 NULL 4 1.00
> 8 NULL 4 1.00
> 8 NULL 4 1.00
> 8 NULL 4 1.00
>
> 9 27447 1 0.50
> 9 J9355 1 0.50
>
> 10 E0784 3 0.30
> 10 NULL 4 0.40
> 10 E0784 3 0.30
> 10 NULL 4 0.40
> 10 J2505 1 0.10
> 10 59510 2 0.20
> 10 NULL 4 0.40
> 10 NULL 4 0.40
> 10 E0784 3 0.30
> 10 59510 2 0.20
>
>
> (3) Finally, I woul like to have an OVERALL percentage.
>
> For example, lets take ID=1. (We have a TOTAL of 34 records.)
>
> I would like a new field, call it, OVERALL_PCT.
>
> For ID=1 and CODE='J3490' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or
> 2.94%).
> For ID=1 and CODE='NULL' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or
> 2.94%).
> For ID=1 and CODE='36415' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or
> 0.00%).
> For ID=1 and CODE='88142' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or
> 0.0%).
> For ID=1 and CODE='59510' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or
> 2.94%).
>
> ID CODE SUM_COUNT OVERALL_PCT
> 1 J3490 1 0.02941
> 1 NULL 1 0.02941
> 1 36415 0 0.00000
> 1 88142 0 0.00000
> 1 59510 1 0.02941
>
>
> Therefore, my final desired data set will look like:
>
> ID CODE SUM_COUNT WITHIN_PCT OVERALL_PCT
> 1 J3490 1 0.20 0.02941
> 1 NULL 1 0.20 0.02941
> 1 36415 0 0.00 0.00000
> 1 88142 0 0.00 0.00000
> 1 59510 1 0.20 0.02941
> 2 99213 0 0.00 0.00000
> 2 80053 0 0.00 0.00000
> 2 J2505 1 0.33 0.02941
> 3 FACIL 1 0.50 0.02941
> 3 J3490 1 0.50 0.02941
> 4 99232 0 0.00 0.00000
> 5 80053 0 0.00 0.00000
> 5 E0784 2 0.50 0.05882
> 5 E0784 2 0.50 0.05882
> 5 J9310 1 0.25 0.02941
> 6 A0431 1 1.00 0.02941
> 7 59400 2 1.00 0.05882
> 7 59400 2 1.00 0.05882
> 8 NULL 4 1.00 0.12500
> 8 NULL 4 1.00 0.12500
> 8 NULL 4 1.00 0.12500
> 8 NULL 4 1.00 0.12500
> 9 27447 1 0.50 0.02941
> 9 J9355 1 0.50 0.02941
> 10 E0784 3 0.30 0.08823
> 10 NULL 4 0.40 0.11765
> 10 E0784 3 0.30 0.08823
> 10 NULL 4 0.40 0.11765
> 10 J2505 1 0.10 0.02941
> 10 59510 2 0.20 0.05882
> 10 NULL 4 0.40 0.11765
> 10 NULL 4 0.40 0.11765
> 10 E0784 3 0.30 0.08823
> 10 59510 2 0.20 0.05882
>
> Thank so much!!!
> tom
>
> --
> We've Got Your Name at http://www.mail.com !
> Get a FREE E-mail Account Today - Choose From 100+ Domains
>
|