Date: Thu, 20 May 2010 10:40:37 -0600
Reply-To: Alan Churchill <alan.churchill@SAVIAN.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Alan Churchill <alan.churchill@SAVIAN.NET>
Subject: Re: Tools to visualize dataset dependencies?
Content-Type: text/plain; charset="us-ascii"
Cool idea. Hard as hell to do.
I started work on it a few years ago and realized the complexity and also
the sheer uselessness of it. SAS programs tend to be linear so all of the
processes were coming out as columns of one step after another. There is
also no consistent way to output a SAS dataset from 1 proc to another.
This is an exercise in parsing and the SAS language is very, very difficult
to parse. You lay out a simple example but it doesn't look that way in the
real world.
The log, btw, is a better place to tackle this issue IMO since the parsing
has already occurred.
Alan
Alan Churchill
Savian
Work: 719-687-5954
Cell: 719-310-4870
-----Original Message-----
From: W. Matthew Wilson [mailto:matt@TPLUS1.COM]
Sent: Thursday, May 20, 2010 8:32 AM
Subject: Tools to visualize dataset dependencies?
I inherited some REALLY long SAS programs that use lots and lots of data
steps and I'm having a hard time keeping it all in my brain.
I'm a big fan of dot (http://graphviz.org) and I would like to use it to
graph the dependencies. Has anyone done anything like this?
For example, I want to translate the SAS code below:
data b;
set a;
/* skip lots of variable assignments here */
run;
proc summary data=b;
/* skip various options here */
output out=c;
run;
data e;
merge c d;
run;
Into something like this dot syntax:
digraph G {
a -> b [label="data step"];
b -> c [label="proc summary"];
c -> e [label="data step"];
d -> e [label="data step"];
};
And then dot will make a purty picture, like this one
:http://scratch.tplus1.com/scratch.png
When I look at that picture, it is obvious to me that the two input datasets
that must already exist for this code are a and d. That fact is NOT obvious
when I read the code, especially since I really have >
50 intermediate data steps in this program and at least a dozen prerequisite
datasets.
Is there already a tool to visualize dependencies like this? Does anyone
have any other ideas for how to attack this problem?
Thanks in advance.
--
W. Matthew Wilson
http://tplus1.com