LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 22 Aug 2007 16:41:14 -0400
Reply-To:     Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject:      Re: SAS data step compute COUNTs and Percentages
In-Reply-To:  <20070822190444.4DA90478106@ws1-5.us4.outblaze.com>
Content-Type: text/plain; charset=ISO-8859-1

Tom,

If ID is a character of length 1, then ID = 10 is not possible.

Muthia Kachirayan

On 8/22/07, Tom White <tw2@mail.com> wrote: > > Hello SAS-L, > > I have a data set containing about 7M records (and about 30 more fields > besides the two fields, ID & CODE) I show here. > > ID is one of the fields in this data set. It stands for a patient ID. > It is a character field of length 1. > > CODE is another 5-digit character field. It stands for a medical > procedure. > This field takes on thousands of values. There is, however, a subset of > these > values I am interested in. This subset contains the CODE values > > NULL 59510 J2505 59400 J1745 J3490 > 27447 > A0431 FACIL J9355 J9310 E0784 > > Here is sample dataset containing 34 records. > > ID CODE > 1 J3490 > 1 NULL > 1 36415 > 1 88142 > 1 59510 > 2 99213 > 2 80053 > 2 J2505 > 3 FACIL > 3 J3490 > 4 99232 > 5 80053 > 5 E0784 > 5 E0784 > 5 J9310 > 6 A0431 > 7 59400 > 7 59400 > 8 NULL > 8 NULL > 8 NULL > 8 NULL > 9 27447 > 9 J9355 > 10 E0784 > 10 NULL > 10 E0784 > 10 NULL > 10 J2505 > 10 59510 > 10 NULL > 10 NULL > 10 E0784 > 10 59510 > > > I would like to create additional fields to augment the above data set for > modeling. > > > > ***************************************************************************************************** > I would like to say in words what I am trying to compute. > > Maybe you can find an easier way to get the results I show below. > > For each patient (ID) in my sample dataset, create new data columnns to > compute > the total number of medical services (CODE) they had within any of the top > medical services (CODEs) > shown below: > > NULL 59510 J2505 59400 J1745 J3490 > 27447 > A0431 FACIL J9355 J9310 E0784 > > The above CODEs are the medical procedures, out of many thousands of > medical procedures, > I have determined to be of interest to me. > > Therefore I would like to know, given the thousands of procedures, and > given my 12 proceduers shwon above, > the total number of procedures (CODEs) a patientb (ID) had within any of > the 12 proceduers shown above. > > I would like to create counts and percentages. > > ***************************************************************************************************** > > > > (1) I would like to know, for each ID, how many CODEs are found in > the subset of codes mentioned above (I repeat them here): > NULL 59510 J2505 59400 J1745 J3490 > 27447 > A0431 FACIL J9355 J9310 E0784 > > For example, create a new field, call it, say, SUM_COUNT, to count > the total number of codes per ID: > > ID CODE SUM_COUNT > 1 J3490 1 > 1 NULL 1 > 1 36415 0 > 1 88142 0 > 1 59510 1 > 2 99213 0 > 2 80053 0 > 2 J2505 1 > 3 FACIL 1 > 3 J3490 1 > 4 99232 0 > 5 80053 0 > 5 E0784 2 > 5 E0784 2 > 5 J9310 1 > 6 A0431 1 > 7 59400 2 > 7 59400 2 > 8 NULL 4 > 8 NULL 4 > 8 NULL 4 > 8 NULL 4 > 9 27447 1 > 9 J9355 1 > 10 E0784 3 > 10 NULL 4 > 10 E0784 3 > 10 NULL 4 > 10 J2505 1 > 10 59510 2 > 10 NULL 4 > 10 NULL 4 > 10 E0784 3 > 10 59510 2 > > (2) Now that I have counts, I would like to compute WITHIN ID percentages. > > For example, within ID=10, we have 10 records. Of those 10 records, we > have: > > NULL=4 records or 4/10=40% > E0784=3 records or 3/10=30% > J2505=1 records or 1/10=10% > 59510=2 records or 2/10=20% > > My data set now will look like: > > ID CODE SUM_COUNT WITHIN_PCT > 1 J3490 1 0.20 > 1 NULL 1 0.20 > 1 36415 0 0.00 > 1 88142 0 0.00 > 1 59510 1 0.20 > > 2 99213 0 0.00 > 2 80053 0 0.00 > 2 J2505 1 0.33 > > 3 FACIL 1 0.50 > 3 J3490 1 0.50 > > 4 99232 0 0.00 > > 5 80053 0 0.00 > 5 E0784 2 0.50 > 5 E0784 2 0.50 > 5 J9310 1 0.25 > > 6 A0431 1 1.00 > > 7 59400 2 1.00 > 7 59400 2 1.00 > > 8 NULL 4 1.00 > 8 NULL 4 1.00 > 8 NULL 4 1.00 > 8 NULL 4 1.00 > > 9 27447 1 0.50 > 9 J9355 1 0.50 > > 10 E0784 3 0.30 > 10 NULL 4 0.40 > 10 E0784 3 0.30 > 10 NULL 4 0.40 > 10 J2505 1 0.10 > 10 59510 2 0.20 > 10 NULL 4 0.40 > 10 NULL 4 0.40 > 10 E0784 3 0.30 > 10 59510 2 0.20 > > > (3) Finally, I woul like to have an OVERALL percentage. > > For example, lets take ID=1. (We have a TOTAL of 34 records.) > > I would like a new field, call it, OVERALL_PCT. > > For ID=1 and CODE='J3490' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or > 2.94%). > For ID=1 and CODE='NULL' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or > 2.94%). > For ID=1 and CODE='36415' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or > 0.00%). > For ID=1 and CODE='88142' we will have SUM_COUNT/34 = 0/34 = 0.00000 (or > 0.0%). > For ID=1 and CODE='59510' we will have SUM_COUNT/34 = 1/34 = 0.02941 (or > 2.94%). > > ID CODE SUM_COUNT OVERALL_PCT > 1 J3490 1 0.02941 > 1 NULL 1 0.02941 > 1 36415 0 0.00000 > 1 88142 0 0.00000 > 1 59510 1 0.02941 > > > Therefore, my final desired data set will look like: > > ID CODE SUM_COUNT WITHIN_PCT OVERALL_PCT > 1 J3490 1 0.20 0.02941 > 1 NULL 1 0.20 0.02941 > 1 36415 0 0.00 0.00000 > 1 88142 0 0.00 0.00000 > 1 59510 1 0.20 0.02941 > 2 99213 0 0.00 0.00000 > 2 80053 0 0.00 0.00000 > 2 J2505 1 0.33 0.02941 > 3 FACIL 1 0.50 0.02941 > 3 J3490 1 0.50 0.02941 > 4 99232 0 0.00 0.00000 > 5 80053 0 0.00 0.00000 > 5 E0784 2 0.50 0.05882 > 5 E0784 2 0.50 0.05882 > 5 J9310 1 0.25 0.02941 > 6 A0431 1 1.00 0.02941 > 7 59400 2 1.00 0.05882 > 7 59400 2 1.00 0.05882 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 8 NULL 4 1.00 0.12500 > 9 27447 1 0.50 0.02941 > 9 J9355 1 0.50 0.02941 > 10 E0784 3 0.30 0.08823 > 10 NULL 4 0.40 0.11765 > 10 E0784 3 0.30 0.08823 > 10 NULL 4 0.40 0.11765 > 10 J2505 1 0.10 0.02941 > 10 59510 2 0.20 0.05882 > 10 NULL 4 0.40 0.11765 > 10 NULL 4 0.40 0.11765 > 10 E0784 3 0.30 0.08823 > 10 59510 2 0.20 0.05882 > > Thank so much!!! > tom > > -- > We've Got Your Name at http://www.mail.com ! > Get a FREE E-mail Account Today - Choose From 100+ Domains >


Back to: Top of message | Previous page | Main SAS-L page