|
I have a utility script that runs a "proc summary" on a specified
dataset and a variable or list of variables (I'll put it at the end).
It runs directly in a Unix directory and I deliberately do not assign
any user formats because I want to look at the raw values. However,
when the summary runs I want it to honor the normal sas numeric formats
like DATE7. This works mainly very well but sometimes there is a user
format assigned to a numeric variable that is something like vtpcd. but
the trouble is the raw numeric value will have a decimal place but the
user format does not show these decimal places. Here is an actual
example of the script run on a dataset in a directory.
$ summary 'vtpcd vtp' vit
Obs VTPCD VTP _FREQ_
1 . 60
2 0 PRE-STUDY PERIOD 64
3 1 PRE-TREATMENT PERIOD/BASELINE 160
4 2 TREATMENT PERIOD, PERIOD 1 1280
5 2 TREATMENT PERIOD, PERIOD 2 1312
But the problem is that the VTPCD numeric variable has the format
VTPCD. assigned and so when this utility runs it does not know to state
the numeric values to one decimal place. The last two values should be
"2.1" and "2.2" and not both "2". But if I cancel the numeric formats
with "format _numeric_;" in the "proc summary" step then I will lose it
for normal numeric formats such as DATE7. as well, since ths is a
utility that could be run on any dataset and any list of variables.
Is there a simple way round this? Ideally there would be a default
format I could specify where a user-defined numeric format is not
found.
Here is the script I am using, for what it is worth.
#!/bin/bash
# Script : summary
# Version : 1.0
# Author : Roland Rashleigh-Berry
# Date : 23-Aug-2004
# Purpose : To run "proc summary" on a dataset and display the
output data set
# (uses SAS).
# SubScripts : none
# Notes : SAS dataset file name extension .sas7bdat will be
ignored. You
# can subset the output dataset. See usage notes.
# Usage : summary sex acct
# summary 'sex agesub' acct
# summary 'sex agesub' 'acct(where=(fascd=1))'
# summary 'sex agesub' 'acct(where=(fascd=1))' 'where
_freq_>20'
#================================================================================
# PARAMETERS:
#-pos-
-------------------------------description--------------------------------
# 1 Variable(s) to summarise. If more than one then enclose in
single quotes
# and separate with spaces. See usage notes.
# 2 Input dataset. Too add a "where" clause put in quotes. See usage
notes.
# 3 Subset clause on output dataset if required. See usage notes.
#================================================================================
# AMENDMENT HISTORY:
# init --date-- mod-id
----------------------description-------------------------
#
#================================================================================
# Put out a usage message if not enough parameters supplied
if [ $# -lt 2 ] ; then
echo "Usage: summary 'var1 var2' dataset 'where _freq_ GT 1'" 1>&2
exit 1
fi
# check on the existence of a sas program in the home directory
if [ -f $HOME/summary.sas ] ; then
echo "SAS program summary already exists in your home directory. You
need to check" 1>&2
echo "if you need it and delete it if not. This utility will not
overwrite it and" 1>&2
echo "will now exit." 1>&2
exit 1
fi
dset=$(echo $2 | sed 's%\.sas7bdat$%%')
# Write SAS code out to a temporary file
cat > $HOME/summary.sas << END
options validvarname=any nofmterr nocenter nodate nonumber;
libname here './' access=readonly;
filename _outfile "$HOME/summary.tmp";
proc printto print=_outfile;
run;
proc summary nway missing data=here.$dset;
class $1;
output out=summary(drop=_type_);
run;
title;
proc print data=summary;
$3;
run;
END
# Run the SAS code
sas -noautoexec -sasuser work -log "$HOME" -sysin "$HOME/summary.sas"
# Delete the temporary SAS code and optionally the log
rm -f $HOME/summary.sas $HOME/summary.log
# If output file exists then cat it and delete it
if [ -f $HOME/summary.tmp ]
then
cat $HOME/summary.tmp
rm -f $HOME/summary.tmp
fi
|