Date: Wed, 22 Mar 2006 11:46:30 -0500
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Efficiently counting things other than records,
by several levels of aggregation
On Wed, 22 Mar 2006 09:42:29 -0500, Peter Crawford
<peter.crawford@BLUEYONDER.CO.UK> wrote:
>On Wed, 22 Mar 2006 05:36:09 -0800, Pardee, Roy <pardee.r@GHC.ORG> wrote:
>
>>Hey All,
>>
>>I've got some pharmacy fill data that I need to generate counts on. The
>>data are in 2 dsets. One is a list of 'national drug codes', with some
>>extra descriptor fields, like so:
>>
>>data pretend_ndcs ;
>> input ndc $ 1-12 general_class $ 14-30 specific_class $ 32-50 ;
>>datalines ;
>>11111111111 Analgesics Non-narcotic
>>22222222222 Analgesics Narcotic
>>33333333333 Antiinfectives Penicillins
>>44444444444 Antiinfectives Tetracyclines
>>55555555555 Antiinfectives Tetracyclines
>>;
>>
>>The other is a list of individual drug fills, again w/some extra
>>descriptor fields:
>>
>>data pretend_fills ;
>> input mrn $ 1-10 ndc $ 12-23 gender $ 25-25 age_group $ 27-33 ;
>>datalines ;
>>Roy 11111111111 M 0to17
>>Roy 11111111111 M 0to17
>>Roy 33333333333 M 0to17
>>Roy 55555555555 M 0to17
>>Bill 11111111111 M 18to44
>>Mary 11111111111 F 0to17
>>Joan 11111111111 F 0to17
>>Tim 11111111111 M 45to64
>>Tim 22222222222 M 45to64
>>Diane 11111111111 F 0to17
>>Diane 22222222222 F 0to17
>>Laurel 11111111111 F 18to44
>>Laurel 22222222222 F 18to44
>>;
>>
>>I want to generate datasets containing counts of the distinct mrn's
>>(that is, the # of unique people who got the fills):
>>
>> by general_class
>> by general_class and gender
>> by general_class and gender and age_group
>> by specific_class
>> by specific_class and gender
>> by specific_class and gender and age_group
>>
>>Now SQL has the very nice count(distinct <<varname>>) syntax, so the
>>following will work to make the first 3 datasets.
>>
>>proc sql ;
>> * Mash the 2 dsets together so all the descriptors are on a single
>>dataset. ;
>> create table gnu as
>> select f.*, general_class, specific_class
>> from pretend_fills as f inner join
>> pretend_ndcs as n
>> on f.ndc = n.ndc
>> ;
>>
>> title "Counts by general class" ;
>> select general_class, count(distinct mrn) as num_people, count(*) as
>>num_fills
>> from gnu
>> group by general_class
>> ;
>>
>> title "Counts by general class and gender" ;
>> select general_class, gender, count(distinct mrn) as num_people,
>>count(*) as num_fills
>> from gnu
>> group by general_class, gender
>> ;
>>
>> title "Counts by general class and gender and age group" ;
>> select general_class, gender, age_group, count(distinct mrn) as
>>num_people, count(*) as num_fills
>> from gnu
>> group by general_class, gender, age_group
>> ;
>>
>>quit ;
>>
>>But I don't want to have to pass through my real dataset six times (and
>>in reality I've got an extra NDC descriptor so it'd be nine times) to
>>generate these things. In real life the datasets are Way Too Big for
>>that sort of thing.
>>
>>Is there a PROC that will generate these counts for me? The obvious
>>candidates (to me) are FREQ and MEANS. I couldn't get MEANS to do
>>anything for me, as I don't have any numeric vars. I was able to coax
>>FREQ to count *fills* really easily, but counting *people* has so far
>>eluded me.
>>
>>I'd also consider datastep solutions, so long as they were reasonably
>>simple to implement. My datastep skills continue to atrophy...
>>
>>Thanks!
>>
>>-Roy
>
>Roy
>
>from a join on ndc code to collect the general- and specific-class, you'll
>have enough content to accumulate the counters you need.
>
>Proc means will work without analysis variables to generate counts within
>class combinations.
>Like below. Note that you have made my job easy by defining the class
>combinations you are interested in.
>
>proc means noprint data= joined ;
> class mrn general_class specific_class gender age_group ;
> types
>mrn* general_class
>mrn* general_class* gender
>mrn* general_class* gender* age_group
>mrn*specific_class
>mrn*specific_class* gender
>mrn*specific_class* gender* age_group
> ;
> output out= first_counters ;
>run;
>***
>However, this is not the final answer.
>You get observation counts from proc means, and you want distinct mrn
>counts ;
> * so follow that with a count within each of these "types"
>;
>proc sql ;
> create table second_counters as
> select count( distinct mrn ) as n_individuals,
> _type_, general_class, specific_class, gender, age_group
> from first_counters
> group by _type_, general_class, specific_class, gender, age_group
> ;
>quit;
It is also possible to use PROC MEANS for this second stage. The tricky
part will involve renaming _TYPE_ (from the first stage) to avoid a
collision with _TYPE_ (from the second stage).
>The actual value of _type_ is a little complex to describe, but when
>you look at the results, it might become obvious
>
>24 mrn* general_class
>26 mrn* general_class* gender
>27 mrn* general_class* gender* age_group
>20 mrn* specific_class
>22 mrn* specific_class* gender
>23 mrn* specific_class* gender* age_group
>
>It's something about counting in binary
>Ignore that if my description isn't clear...... ,
>and just look at non-blank class columns to see what's relevant
>
>Goood Luck
>
>Peter Crawford
|