LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2008, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Oct 2008 20:11:52 -0500
Reply-To:     Lou <lpogoda@HOTMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Lou <lpogoda@HOTMAIL.COM>
Organization: NNTP Server
Subject:      Re: Dataset from Dictionary?
Comments: To:

"Ken Barz" <Ken.Barz@CPCMED.ORG> wrote in message news:92BD54B69F8AA446B8446927AAADE7B403A0645A@cpc-1.Colorado.local... > Well, yeah, I was just using airports and dictionary.columns as an > example of a similar dictionary. In the CDISC implementation guide, > they give the tables, variables, labels, , type, formats, ... in a data > dictionary format. So, I don't have an actual dataset that I can use > 'like' with. > > Since there's a whole slew of these things that need creating (and > across many different studies), I was hoping to pull the dictionary in > using Excel and have some way to use that metadata to create the actual > working dataset.

The copy of the implementation guide I have came in PDF format - I've tried various methods to get those "dictionaries" out of the PDF into something more usuable - a text file, spreadsheet, Word document, etc. I haven't found anything reliable.

If you find something that can eport/import this stuff to a spreadsheet, please let me know. If you have a copy of the guide in some format more usuable than PDF, please let me know where you got it.

That aside, once you have your data dictionary in a spreadsheet, using it to create a dataset with those variables with those attributes isn't a big deal - I suppose someone could write a PROC, but I don't know of one already out there. My place of business doesn't currently license access to PC file formats, so we save the spreadsheet to a tab delimited text file, or alternatively as an XML file that can be read using the excel tagset.

Anyway, we write the code to produce whatever variables are called for in the domain, but don't pay any attention to the attributes or the presence of extraneous variables. This dataset will also contain variables needed for things like supplementary qualifier datasets and/or comment datatsets, etc. Once all the necessary variables are generated with their values, we read the spreadsheet surrogate like so (if the surrogate is a tab delimited text file):

data map (keep = col1 col2 col4); retain keeper 0; infile map lrecl = 1000 pad dlm = '09'x dsd; length col1 - col10 $ 100; input col1 - col10; if lowcase(compress(col1)) = 'variablename' then do; keeper = 1; return; end; if keeper then do; col3 = lowcase(col3); col4 = compress(col4, '$'); if col3 = 'num' then col4 = '8'; if col3 = 'char' then col4 = '$ ' || col4; if col1 ^= ' ' then output; end; run;

The "keeper" variable lets the program discard useless-to-the-program header information that clients seem to insist be included in the map but get in the way of the program. If you license access to pc file formats, you could use PROC IMPORT to read the spreadsheet directly, but you still would want to process the result to get rid of extraneous rows. To set the variable attributes, we write an include file (to a TEMP file) from the MAP dataset generated above and use it like this:

data _null_; set map end = done; file include; if _n_ = 1 then put 'attrib '; put col1 'length = ' col4 'label = "' col2 +(-1) '"'; if done then put ';'; data your-dataset-name-here; %include include; set your-dataset-full-of-variables-here; run;

To write the final dataset, we use SQL like so:

proc sql noprint; select trim(col1) into :names separated by ', ' from map; create table your-domain-name (label = 'your-domain-label') as select &names from your-dataset-name-here order by whatever-your-sort-variables-are-here; quit;

If you need supplementary qualifier datasets, comment datasets, etc. they can all be generated from the your-dataset-full-of-variables dataset in the same way.

I admit this is kind of wordy - it would be possible to collapse this into fewer steps, the attrib statement could be a macro variable instead of an %include file, etc. - but I'm staffed with relatively young, inexperienced programmers, and they can follow the explict steps more easily. If you must have a macro, wrapping the above in %macro and %mend statements with a few parameters would be the work of a few minutes.

Back to: Top of message | Previous page | Main SAS-L page