Date: Wed, 1 Aug 2007 20:12:31 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: nway unique counts
In-Reply-To: <200708012349.l71L1473005865@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Kenneth:
Check out the NWAY option of PROC SUMMARY or PROC MEANS. It limits the
summary to distinct sets of values of the variables. The DISTINCT
keyword in a SQL SELECT statement works much the same way if you add a
COUNT(*) as n and a GROUP BY clause to that lists the same class of
variables as in the SELECT (minus the COUNT(*)).
S
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Kenneth Karan
Sent: Wednesday, August 01, 2007 7:50 PM
To: SAS-L@LISTSERV.UGA.EDU
Cc: Kenneth Karan
Subject: nway unique counts
I already have a way to do this that uses macros, proc sort, and first.
by processing. This is horrendously inefficient. The dataset I need to
use this on is way too large. Can someone recommend an approach that
would be more efficient? Would indexes help? Data step hash objects?
I could load this into Oracle and use the "count(distinct(SSN))"
functions with the "group by cube" statement but I am trying to avoid
involving IT.
Thanks in advance.
~ Ken
Given the dataset:
Obs State Gender Mon SSN
1 NY M Jan 123456789
2 NY M Feb 123456789
3 NY M Mar 123456789
4 NY M Jan 987654321
5 NY M Feb 987654321
6 FL M Mar 987654321
7 NY F Jan 234567890
8 NY F Feb 234567890
9 FL F Jan 345678901
I would like to produce the dataset:
Obs State Gender Mon _type_ People
1 0 4
2 Feb 1 3
3 Jan 1 4
4 Mar 1 2
5 F 2 2
6 M 2 2
7 F Feb 3 1
8 F Jan 3 2
9 M Feb 3 2
10 M Jan 3 2
11 M Mar 3 2
12 FL 4 2
13 NY 4 3
14 FL Jan 5 1
15 FL Mar 5 1
16 NY Feb 5 3
17 NY Jan 5 3
18 NY Mar 5 1
19 FL F 6 1
20 FL M 6 1
21 NY F 6 1
22 NY M 6 2
23 FL F Jan 7 1
24 FL M Mar 7 1
25 NY F Feb 7 1
26 NY F Jan 7 1
27 NY M Feb 7 2
28 NY M Jan 7 2
29 NY M Mar 7 1