Date: Wed, 6 Aug 2003 10:08:00 +0100
Reply-To: Roland <roland@RASHLEIGH-BERRY.FSNET.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Roland <roland@RASHLEIGH-BERRY.FSNET.CO.UK>
Organization: Universe Monitors
Subject: How I use Unix while Clinical reporting
If you are a clinical trials programmer using SAS on Unix then maybe you
come from a VAX background. You have learnt the equivalent Unix commands to
the comands on the VAX you are used to and are now happily working away in
the Unix environment and you find it as easy as you did the VAX. But what
you may not realise is that Unix has possibilities for speed and efficiency
as well as speeding up QC work that you can not imagine.
Here is an extract from a page on my web site.
http://www.datasavantconsulting.com/roland/unix.html
How I use Unix with SAS
I have written a number of Unix utilities and many of these execute SAS
within them. I use some of these utilities every day at work. I can't
remember a day, now, in the past year, when I have not used at least one of
them. I'll give you a list of the SAS/Unix utilities I use often as well as
some of the pure Unix ones that still relate to the work I do with SAS. I'd
like you to imagine what it could be like working in the following way that
I will illustrate. Put yourself in my place and imagine doing the same
thing.
contents
I create a dataset and place it in a library. I want to check that I have
assigned labels to all the variables. I make that directory the current
directory and type in the command at the Unix prompt contents demog. I see
the contents of the demog dataset displayed on the screen. If I want to see
more details then I type in contentsl demog instead (a longer version of my
contents utility) and see the length, variable type and formats as well. I
will soon see if I have missed off a label. Maybe I want to see the contents
of all the datasets in that library. I just type the command contents and
there it is for every dataset. If I want to route what I see to a file then
I type in something like contents > cont instead and can browse the file
cont later. Suppose I want to know what datasets contain the variable SESS
then I can pipe the output of the contents command to grep like this
contents | grep SESS and there, on the screen, are all the datasets that
contain the variable SESS. Do you see how SAS and Unix can work together? Do
you see how simple it is? There's more to come.
allmiss
I've created a dataset in the output library. I've used contents to check
the labels. All is well. But have I populated all the variables? I just type
in the command allmiss demog and it will tell me if I have any all-missing
character or numeric variables. I type in misscnt demog if I want a count of
the number of missing values for each variable rather than be informed about
the all-missing variables. Suppose I want to check the whole library for
all-missing variables. I just type in allmiss with no parameters.
printalln
I come across a strange subject whose data just doesn't add up. I need to
look at all the data I have for that subject and piece together what is
going wrong by cross-referencing the information in the various datasets. I
do this nearly every day. I type in the command printalln subject=1234 >
subj1234 and then all the data for subject 1234 in that library is put in
the file subj1234 where I can browse it. If I were interested in data for an
unexpected session then I could type in something like printalln sess=99 >
sess99 and go look at it in the file sess99. It's as easy as that. Yes. it
is running SAS behind the scenes. Of course it is. But you won't find any
sas code or logs being left behind in those directories. It just does its
work and then disappears. It is just like a native Unix command except that
you have SAS working for you instead.
intitlesnoprogs
I have a tight deadline. Time is running short. There is a "titles" dataset
somewhere with all the titles and footnotes in it for all the code that
produces output. Is meeting this deadline going to be possible? How many
reporting programs haven't been written yet? Well, it's easy for me to find
out. I just type in the command intitlesnoprogs in any relevent study
directory and I get a list up on the screen of all the missing reporting
programs. The utility has read the titles dataset, has searched the program
directories (or perhaps all the programs directories for that study area)
and got a list of entries and told me which sas programs haven't been
written yet. This is SAS and Unix working directly together to provide you
useful SAS project information.
clash
This is a simple one I wrote years ago. I have created a library of SAS
datasets and I want to know where there are discrepancies of label, length,
format or whatever among identically-named variables throughout that
library. I just type in the command clash and then I see the discrepancies
listed. If there are a number of them I might repeat the command but direct
it to a file where I can mull over the discrepancies at my leisure like this
clash > clash.
scanlogs
This should be made compulsory for QC'ing, in my opinion. A suite of
programs has run. Have all the error messages and warning messages been
checked? What about the important note statements put out in the log? I can
just type in the command scanlogs for a directory and it will scan all the
logs for important messages that programmers need to check out. I could pick
a single log, if I wanted to, or a specific group of logs like this scanlogs
d3p*.log.
rescue
I once managed to delete all the programs in a directory by using the Unix
command rm *.sas when I meant to type in rm *.log. I was tired that day.
Since that day I create and maintain a backup sub-directory in all program
directories and would advise others to do the same. This was a small
disaster but I still had all the logs from the programs, so I wrote a
utility called rescue. It gets back the code from the program logs. It can
be implemented using SAS talking to Unix or using pure Unix utilities (awk
or nawk to be precise). That's got to be better than making a fool of
yourself to Unix support and waiting three days for them to bring back your
backed-up SAS programs. Especially if you have to deliver your reports the
next day in any case.
hdr
If I create a new program, I use a script I wrote called hdr something like
this: hdr newprog . It prompts me for a program purpose and creates the SAS
member with all sorts of useful information filled in including the project
and study identity it has pulled out of the directory name. It has pulled
out my name as the program author and puts in the date. If it is a macro,
and I want more in the header, then I use the command mhdr instead. If it is
a Unix shell script then I use shdr instead. Documentation of code is a
pain, but this helps. It almost makes documentation a fun thing to do. And
when your study documentation is good and easy then things are on the up and
up, programming wise.
ddiff
You may or may not have heard about the diff utility that is native to Unix.
You use it to compare two files. These files could be report output files.
You know that sometimes you need to do a complete rerun due to a couple of
data changes and you want to make sure that the outputs have only changed in
the way you want. And you are not interested in seeing listing of
differences of who ran what when such as lines like:
"userid:/sas/programs/thisprog.sas 23JUL03 15:13 Page 2 of 88" .
You just want to see real differences in figures that you are expecting.
Well, I wrote one to do that based on the native diff utility. You go to a
subdirectory of your output directory where you have stored all your
previous outputs and type in the command ddiff and you will see a list of
all differences between the outputs in the subdirectory compared to those in
the parent directory. This only takes seconds to run and for you to check
that the outputs match what you expected. Better than re-QC'ing the whole
lot. But in order for this to work well you will need to know a bit about
something called "pattern matching" so you can spot and blank out those
lines you are not interested in.
listempty and pages
More on report outputs. I can check to see if any reports don't contain
anything by just typing in the command listempty *.lst. Once I am satisfied
that all the reports contain something then I might want to print out pages
2 to 4 only for all the reports and write them to a file. I can do this with
pages 2 4 *.lst > pages2to4.
sasunixskeleton
These Unix utilities that I have listed above that run SAS (or sometimes
not) have the generic name of "Unix shell scripts". They have a language
syntax that is quite different to SAS (actually there are a few different
types each with their own peculiarities of syntax) and it might put you off
trying to learn them. So you might feel that you will never be able to write
your own utility that calls SAS and interacts with Unix like the ones I have
listed. I thought about that and the problems that SAS programmers might
face making a start on this so I wrote a utility that writes utilities. Yes,
you read that right - It's a utility that writes utilities !!. You call this
utility named sasunixskeleton and it asks you what you want to call your new
utiility and what it does and it writes the shell script for you. All you
have to do is add a bit of code where it says EDIT to supply a usage message
and your SAS code. The rest of the shell script you leave alone and it will
work, even if you haven't got a clue what it is supposed to be doing.
(Nearly all these utilities are already available on my web site)