**Date:** Fri, 7 Apr 2000 16:58:41 -0400
**Reply-To:** Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
**Sender:** "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
**From:** Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
**Subject:** Re: summary output dataset
**Content-Type:** text/plain; charset=US-ASCII
This is a follow-up to my earlier post (quoted below). My original
solution had a couple of rough edges: the variables were not in the
desired order, and zeroes were not imputed for the missing frequency
counts.

Here is some additional code to take care of those things. It replaces
the final DATA step in the earlier post.

Another useful subset of STATS is the ordered list of the values found
in the original data set. This can be extracted and transposed into a
data set containing a row of zeroes in variables VALUE1-VALUEn.

data zerolong;
set stats(where=(_type_=1));
retain zero 0;
keep col1 zero;
run;

proc transpose data=zerolong
out=zerowide(drop=_name_) prefix=value;
id col1;
var zero;
run;

Now combine the zeroes with the identifier variables to create a frame.

data frame;
set freqs(keep = group var);
if _n_=1 then set zerowide;
run;

Fill in the frame with non-missing frequency counts.

data freqs2;
update frame freqs;
by group var;
run;

Finally bring in the standard deviations.

data all;
merge freqs2 stats(where=(_type_=6));
by group var;
drop _type_ _freq_ col1;
run;

The result:

GROUP VAR VALUE1 VALUE2 VALUE3 VALUE4 VALUE5 STD

1 V1 1 2 0 1 0 1.25831
1 V2 0 2 2 0 0 0.57735
1 V3 1 0 1 2 0 1.41421
1 V4 1 1 0 2 0 1.50000
2 V1 0 1 1 1 0 1.00000
2 V2 0 0 3 0 0 0.00000
2 V3 0 3 0 0 0 0.00000
2 V4 0 0 0 0 3 0.00000

> Date: Fri, 31 Mar 2000 16:34:51 -0500
> From: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>

> You can get pretty close using built-in functionality. First load the
> example:

> data test;
> input group v1 v2 v3 v4;
> cards;
> 1 1 2 3 4
> 1 2 3 4 1
> 1 4 3 4 2
> 1 2 2 1 4
> 2 2 3 2 5
> 2 4 3 2 5
> 2 3 3 2 5
> ;

> Since there are no unique identifiers, temporarily add sequence numbers
> (1,2,3,...):

> data withseq; set test; seq++1; run;

> Transpose so that all of the data values are in one column. Its name by
> default is COL1.

> proc transpose data=withseq
> out=long(rename=(_name_=var) drop=seq);
> by group seq;
> var v: ;
> run;

> One PROC SUMMARY can generate all the stats. But it's kind of unusual,
> in that COL1 is used as both a stratifier (for the frequency counts) and
> as a measure (for the standard deviations):

> proc summary data=long;
> class group var col1;
> var col1;
> output out=stats std(col1)=std;
> run;

> The output data set has a lot of extraneous stuff in it. The trick is
> to do the subsetting (of both observations and variables) which
> will extract what's wanted. First do the frequencies, which have
> to be transposed to get the required arrangement.

> proc transpose data=stats(where=(_type_=7))
> out=freqs(drop=_name_) prefix=value;
> by group var;
> id col1;
> var _freq_;
> run;

> Finally combine with the subset of STATS which has the standard
> deviations:

> data all;
> merge freqs stats(where=(_type_=6));
> by group var;
> drop _type_ _freq_ col1;
> run;

> The result is pretty close:

> GROUP VAR VALUE1 VALUE2 VALUE4 VALUE3 VALUE5 STD

> 1 V1 1 2 1 . . 1.25831
> 1 V2 . 2 . 2 . 0.57735
> 1 V3 1 . 2 1 . 1.41421
> 1 V4 1 1 2 . . 1.50000
> 2 V1 . 1 1 1 . 1.00000
> 2 V2 . . . 3 . 0.00000
> 2 V3 . 3 . . . 0.00000
> 2 V4 . . . . 3 0.00000

> >From: Greg Brown <GBrown@PRCONLINE.COM>
> >
> >I am new to sas code and am trying to create a summary dataset with the
> >frequency of the values and the standard deviation tied to 1 group
> >record. I have been reading the emails about creating a summary of all
> >categories and am still struggling with creating the desired output. I
> >would like to take an initial dataset in the following format
> >
> >group v1 v2 v3 v4
> >1 1 2 3 4
> >1 2 3 4 1
> >1 4 3 4 2
> >1 2 2 1 4
> >2 2 3 2 5
> >2 4 3 2 5
> >2 3 3 2 5
> >
> >and get an output dataset
> >
> >group var value1 value2 value3 value4 value5 std
> >1 v1 1 2 0 1 0
> >1 v2 0 2 2 0 0
> >1 v3 1 0 1 2 0
> >1 v4 1 1 0 2 0
> >2 v1 0 1 1 1 0
> >2 v2 0 0 3 0 0
> >...
> >
> >the values in the columns (value1-value5) represent the frequency/sum of
> >the values for the variable by group. I have been able to create a
> >dataset with only the specifc value and then running a proc means to get
> >the n and finally transposing the data to get the group & var as rows
> >and values as columns. It looks like I would have to create multiple
> >datasets and then merge these datasets to achieve the final output - the
> >values in the initial dataset generally range from 1-10 and the number
> >of variables range from 20-40. Any thoughts or suggestions would be
> >greatly appreciated. Thanks.
> >
> >Greg