Date: Fri, 1 Aug 2003 10:01:20 -0400
Reply-To: peter <pedennis@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: peter <pedennis@HOTMAIL.COM>
Subject: Big data set summary report
Hello SAS-Lers;
I need some ideas for the most efficient way of doing the following.
I have a dataset that I will try to describe below.
Each record in the dataset represents a unique individual, with attributes
of type s1-sn, a value of "y" means that the individual has that attribute
or "n" meaning he does not have the attribute. Also the variables l1-ln
says that the individual is on list 1 to list n if the value is s or e and -
if not on the list
The input files can be very big as many as 100 million records with up to
40 variables. I am trying to make my report generation engine as dynamic as
possible so I am using the varname function to get the variable names
and then pass the data set into a proc summary.
I am still unsure about how to do report 2.
I hope I have explained enough to elicit some ideas.
INPUT DATASET
cnum s1 s2 s3 s4 s5 s6 .. .sn l1 l2 l3 l4..ln
1 y n n y y n n e e s - -
2 n n y n n y y s s e s s
3 n y y n y n n - - - - s
4 n n n n n n n s - - - -
5 y y y y y y y s s s - s
.....
K ......
I need two reports
Report-1 should look like the following
count is the number of y's for each sn for each ln
REPORT 1
L S Count
l1 s1 n1
s2 n2
s3 n3
s4 n4
s5 n5
sn nn
l2 s1 n1
s2 n2
s3 n3
s4 n4
sn nn
l3 s1 n1
s2 n2
s3 n3
s4 n4
sn nn
REPORT 2
l1 summary for s
s1 s2 s3 s4 s5 s6 .. .sn count
y n n y y n n #
n n y n n y y #
n y y n y n n #
n n n n n n n #
y y y y y y y #
.....
......
l1 summary for e
s1 s2 s3 s4 s5 s6 .. .sn count
y n n y y n n #
n n y n n y y #
n y y n y n n #
n n n n n n n #
y y y y y y y #
.....
......
l1 summary for -
s1 s2 s3 s4 s5 s6 .. .sn count
y n n y y n n #
n n y n n y y #
n y y n y n n #
n n n n n n n #
y y y y y y y #
.....
......
l2 summary for s
s1 s2 s3 s4 s5 s6 .. .sn count
y n n y y n n #
n n y n n y y #
n y y n y n n #
n n n n n n n #
y y y y y y y #
.....
......
l2 summary for e
s1 s2 s3 s4 s5 s6 .. .sn count
y n n y y n n #
n n y n n y y #
n y y n y n n #
n n n n n n n #
y y y y y y y #
.....
......
l2 summary for
s1 s2 s3 s4 s5 s6 .. .sn count
y n n y y n n #
n n y n n y y #
n y y n y n n #
n n n n n n n #
y y y y y y y #
.....
......
ln etc