Date: Mon, 2 Feb 2009 17:26:24 -0600
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Help with summarizing data
In-Reply-To: <200902022314.n12LPq01023822@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
Ah, and here I was proud of myself that I knew how to do a group summary
function without looking it up ... :) Never knew that UNION dropped
nonunique rows, obviously something very important to know. Thanks Howard
and Akshaya!
-Joe
On Mon, Feb 2, 2009 at 5:14 PM, Howard Schreier <hs AT dc-sug DOT org> <
schreier.junk.mail@gmail.com> wrote:
> On Mon, 2 Feb 2009 15:24:22 -0600, Joe Matise <snoopy369@GMAIL.COM> wrote:
>
> >Just for reference, if you wanted to use SQL this would do it (and matches
> >PROC SUMMARY results):
> >
> >
> >proc sql;
> >create table allqs_sum_sql as
> > select sum(a), sum(b), sum(c) from
> > ((select * from q1)
> >union
> > (select * from q2)
> >union
> > (select * from q3)
> >union
> > (select * from q4))
> > group by state;
> >quit;
> >
> >The 'group by' makes sure that each state is summed properly. It does not
> >give you the counts as neatly as proc summary/means would, though you
> could
> >ask for count(a) I think if you want that information.
> >
> >-Joe
>
> Include STATE in the SELECT.
>
> There are also a couple of subtle issues which could cause incorrect
> results
> with no other symptoms. (1) If it happens that a state has the same values
> for A, B, and C in two quarters, the UNION operator will discard one of
> those rows; the ALL option can prevent this. (2) If the columns are not in
> the same order in all of the tables, the sums will not properly segregate
> the values; use the CORRESPONDING option to prevent that.
>
> >
> >On Mon, Feb 2, 2009 at 3:22 PM, Kathleen Santos <flowerpotcat@gmail.com
> >wrote:
> >
> >> Yeah, I kinda hit send before I finished typing.
> >> Thanks. That's a really good suggestion.
> >>
> >> On Mon, Feb 2, 2009 at 3:19 PM, Joe Matise <snoopy369@gmail.com> wrote:
> >>
> >>> I think the PROC SQL got missed.
> >>>
> >>> If you set them together:
> >>>
> >>> data allQs;
> >>> set q1 q2 q3 q4;
> >>> run;
> >>>
> >>> perhaps with a quarter identifier if that is interesting to you
> >>>
> >>> and then use PROC SUMMARY or PROC MEANS, with state as the class, it
> will
> >>> take into account only the present records and not count the missing
> records
> >>> against it.
> >>>
> >>> For example:
> >>> data q1;
> >>> q=1;
> >>> do state = 1 to 51;
> >>> a = 100*round(ranuni(0),0.001);
> >>> b = 100*round(ranuni(0),0.001);
> >>> c = 100*round(ranuni(0),0.001);
> >>> output;
> >>> end;
> >>> run;
> >>>
> >>> data q2;
> >>> q=2;
> >>> do state = 1 to 51;
> >>> a = 100*round(ranuni(0),0.001);
> >>> b = 100*round(ranuni(0),0.001);
> >>> c = 100*round(ranuni(0),0.001);
> >>> output;
> >>> end;
> >>> run;
> >>>
> >>> data q3;
> >>> q=3;
> >>> do state = 1 to 50;
> >>> a = 100*round(ranuni(0),0.001);
> >>> b = 100*round(ranuni(0),0.001);
> >>> c = 100*round(ranuni(0),0.001);
> >>> output;
> >>> end;
> >>> run;
> >>> data q4;
> >>> q=4;
> >>> do state = 13 to 29;
> >>> a = 100*round(ranuni(0),0.001);
> >>> b = 100*round(ranuni(0),0.001);
> >>> c = 100*round(ranuni(0),0.001);
> >>> output;
> >>> end;
> >>> run;
> >>>
> >>> data allQs;
> >>> set q1 q2 q3 q4;
> >>> run;
> >>>
> >>> proc summary data=allqs noprint nway;
> >>> class state;
> >>> var a b c;
> >>> output out=allqs_sum
> >>> sum(a)=sum_a sum(b)=sum_b sum(c)=sum_c;
> >>> run;
> >>>
> >>>
> >>> and then allqs_sum has the sum for each state over each variable. If
> you
> >>> have more than a few variables, look into procedurally generating code
> with
> >>> PROC SQL select into :, or use the : operator and /autoname.
> >>>
> >>> -Joe
> >>>
> >>> On Mon, Feb 2, 2009 at 3:04 PM, Kathleen Santos <
> flowerpotcat@gmail.com
> >>> > wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I need help summarizing my data.
> >>>>
> >>>> I have four data sets,say Q1, Q2, Q3, and Q4. These four data sets
> have
> >>>> the
> >>>> same variables, say A B and C. The cases are states, so one record per
> >>>> state. However, not all states are present in all four. In Q1 and Q2,
> the
> >>>> 51
> >>>> states are present. In Q3, 50 states are present, but in Q4, only 17
> >>>> states
> >>>> are in the data set.
> >>>>
> >>>> What I would like to do is to sum A (then B, then C) across all
> datasets
> >>>> per
> >>>> state. So I will have this kind of data set:
> >>>>
> >>>> State New variable
> >>>> State1 A_q1+A_q2+A_q3+A_q4
> >>>> State2 A_q1+A_q2+A_q3+A_q4
> >>>> |
> >>>> State17 A_q1+A_q2+A_q3+A_q4
> >>>> State18 A_q1+A_q2+A_q3
> >>>> |
> >>>> State50 A_q1+A_q2+A_q3
> >>>> State51 A_q1+A_q2
> >>>>
> >>>> I tried to do this using PROC SQL using this code:
> >>>>
> >>>>
> >>>> *
> >>>>
> >>>> *
> >>>>
> >>>
> >>>
> >>
>
|