LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (March 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 22 Mar 2006 11:46:30 -0500
Reply-To:     "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject:      Re: Efficiently counting things other than records,
              by several levels of aggregation

On Wed, 22 Mar 2006 09:42:29 -0500, Peter Crawford <peter.crawford@BLUEYONDER.CO.UK> wrote:

>On Wed, 22 Mar 2006 05:36:09 -0800, Pardee, Roy <pardee.r@GHC.ORG> wrote: > >>Hey All, >> >>I've got some pharmacy fill data that I need to generate counts on. The >>data are in 2 dsets. One is a list of 'national drug codes', with some >>extra descriptor fields, like so: >> >>data pretend_ndcs ; >> input ndc $ 1-12 general_class $ 14-30 specific_class $ 32-50 ; >>datalines ; >>11111111111 Analgesics Non-narcotic >>22222222222 Analgesics Narcotic >>33333333333 Antiinfectives Penicillins >>44444444444 Antiinfectives Tetracyclines >>55555555555 Antiinfectives Tetracyclines >>; >> >>The other is a list of individual drug fills, again w/some extra >>descriptor fields: >> >>data pretend_fills ; >> input mrn $ 1-10 ndc $ 12-23 gender $ 25-25 age_group $ 27-33 ; >>datalines ; >>Roy 11111111111 M 0to17 >>Roy 11111111111 M 0to17 >>Roy 33333333333 M 0to17 >>Roy 55555555555 M 0to17 >>Bill 11111111111 M 18to44 >>Mary 11111111111 F 0to17 >>Joan 11111111111 F 0to17 >>Tim 11111111111 M 45to64 >>Tim 22222222222 M 45to64 >>Diane 11111111111 F 0to17 >>Diane 22222222222 F 0to17 >>Laurel 11111111111 F 18to44 >>Laurel 22222222222 F 18to44 >>; >> >>I want to generate datasets containing counts of the distinct mrn's >>(that is, the # of unique people who got the fills): >> >> by general_class >> by general_class and gender >> by general_class and gender and age_group >> by specific_class >> by specific_class and gender >> by specific_class and gender and age_group >> >>Now SQL has the very nice count(distinct <<varname>>) syntax, so the >>following will work to make the first 3 datasets. >> >>proc sql ; >> * Mash the 2 dsets together so all the descriptors are on a single >>dataset. ; >> create table gnu as >> select f.*, general_class, specific_class >> from pretend_fills as f inner join >> pretend_ndcs as n >> on f.ndc = n.ndc >> ; >> >> title "Counts by general class" ; >> select general_class, count(distinct mrn) as num_people, count(*) as >>num_fills >> from gnu >> group by general_class >> ; >> >> title "Counts by general class and gender" ; >> select general_class, gender, count(distinct mrn) as num_people, >>count(*) as num_fills >> from gnu >> group by general_class, gender >> ; >> >> title "Counts by general class and gender and age group" ; >> select general_class, gender, age_group, count(distinct mrn) as >>num_people, count(*) as num_fills >> from gnu >> group by general_class, gender, age_group >> ; >> >>quit ; >> >>But I don't want to have to pass through my real dataset six times (and >>in reality I've got an extra NDC descriptor so it'd be nine times) to >>generate these things. In real life the datasets are Way Too Big for >>that sort of thing. >> >>Is there a PROC that will generate these counts for me? The obvious >>candidates (to me) are FREQ and MEANS. I couldn't get MEANS to do >>anything for me, as I don't have any numeric vars. I was able to coax >>FREQ to count *fills* really easily, but counting *people* has so far >>eluded me. >> >>I'd also consider datastep solutions, so long as they were reasonably >>simple to implement. My datastep skills continue to atrophy... >> >>Thanks! >> >>-Roy > >Roy > >from a join on ndc code to collect the general- and specific-class, you'll >have enough content to accumulate the counters you need. > >Proc means will work without analysis variables to generate counts within >class combinations. >Like below. Note that you have made my job easy by defining the class >combinations you are interested in. > >proc means noprint data= joined ; > class mrn general_class specific_class gender age_group ; > types >mrn* general_class >mrn* general_class* gender >mrn* general_class* gender* age_group >mrn*specific_class >mrn*specific_class* gender >mrn*specific_class* gender* age_group > ; > output out= first_counters ; >run; >*** >However, this is not the final answer. >You get observation counts from proc means, and you want distinct mrn >counts ; > * so follow that with a count within each of these "types" >; >proc sql ; > create table second_counters as > select count( distinct mrn ) as n_individuals, > _type_, general_class, specific_class, gender, age_group > from first_counters > group by _type_, general_class, specific_class, gender, age_group > ; >quit;

It is also possible to use PROC MEANS for this second stage. The tricky part will involve renaming _TYPE_ (from the first stage) to avoid a collision with _TYPE_ (from the second stage).

>The actual value of _type_ is a little complex to describe, but when >you look at the results, it might become obvious > >24 mrn* general_class >26 mrn* general_class* gender >27 mrn* general_class* gender* age_group >20 mrn* specific_class >22 mrn* specific_class* gender >23 mrn* specific_class* gender* age_group > >It's something about counting in binary >Ignore that if my description isn't clear...... , >and just look at non-blank class columns to see what's relevant > >Goood Luck > >Peter Crawford


Back to: Top of message | Previous page | Main SAS-L page