Date: Mon, 27 Oct 2008 20:11:52 -0500
Reply-To: Lou <lpogoda@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Lou <lpogoda@HOTMAIL.COM>
Organization: Aioe.org NNTP Server
Subject: Re: Dataset from Dictionary?
"Ken Barz" <Ken.Barz@CPCMED.ORG> wrote in message
> Well, yeah, I was just using airports and dictionary.columns as an
> example of a similar dictionary. In the CDISC implementation guide,
> they give the tables, variables, labels, , type, formats, ... in a data
> dictionary format. So, I don't have an actual dataset that I can use
> 'like' with.
> Since there's a whole slew of these things that need creating (and
> across many different studies), I was hoping to pull the dictionary in
> using Excel and have some way to use that metadata to create the actual
> working dataset.
The copy of the implementation guide I have came in PDF format - I've tried
various methods to get those "dictionaries" out of the PDF into something
more usuable - a text file, spreadsheet, Word document, etc. I haven't
found anything reliable.
If you find something that can eport/import this stuff to a spreadsheet,
please let me know. If you have a copy of the guide in some format more
usuable than PDF, please let me know where you got it.
That aside, once you have your data dictionary in a spreadsheet, using it to
create a dataset with those variables with those attributes isn't a big
deal - I suppose someone could write a PROC, but I don't know of one already
out there. My place of business doesn't currently license access to PC file
formats, so we save the spreadsheet to a tab delimited text file, or
alternatively as an XML file that can be read using the excel tagset.
Anyway, we write the code to produce whatever variables are called for in
the domain, but don't pay any attention to the attributes or the presence of
extraneous variables. This dataset will also contain variables needed for
things like supplementary qualifier datasets and/or comment datatsets, etc.
Once all the necessary variables are generated with their values, we read
the spreadsheet surrogate like so (if the surrogate is a tab delimited text
data map (keep = col1 col2 col4);
retain keeper 0;
infile map lrecl = 1000 pad dlm = '09'x dsd;
length col1 - col10 $ 100;
input col1 - col10;
if lowcase(compress(col1)) = 'variablename' then do;
keeper = 1;
if keeper then do;
col3 = lowcase(col3);
col4 = compress(col4, '$');
if col3 = 'num' then col4 = '8';
if col3 = 'char' then col4 = '$ ' || col4;
if col1 ^= ' ' then output;
The "keeper" variable lets the program discard useless-to-the-program header
information that clients seem to insist be included in the map but get in
the way of the program. If you license access to pc file formats, you could
use PROC IMPORT to read the spreadsheet directly, but you still would want
to process the result to get rid of extraneous rows. To set the variable
attributes, we write an include file (to a TEMP file) from the MAP dataset
generated above and use it like this:
set map end = done;
if _n_ = 1 then put 'attrib ';
put col1 'length = ' col4 'label = "' col2 +(-1) '"';
if done then put ';';
To write the final dataset, we use SQL like so:
proc sql noprint;
select trim(col1) into :names separated by ', '
create table your-domain-name (label = 'your-domain-label') as
order by whatever-your-sort-variables-are-here;
If you need supplementary qualifier datasets, comment datasets, etc. they
can all be generated from the your-dataset-full-of-variables dataset in the
I admit this is kind of wordy - it would be possible to collapse this into
fewer steps, the attrib statement could be a macro variable instead of an
%include file, etc. - but I'm staffed with relatively young, inexperienced
programmers, and they can follow the explict steps more easily. If you must
have a macro, wrapping the above in %macro and %mend statements with a few
parameters would be the work of a few minutes.