Date: Thu, 4 Dec 2003 11:28:31 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Debugging SAS with many variables (newbie)
Been there, done that. What a pain!
First, more than thirty variables per dataset (table) virtually guarantees
bad database design. You may want to start restructuring/reshaping your data
into linked datasets. The SAS INPUT statement supports restructuring. The
column pointers read data efficiently from different positions in the file
layout. For example,
@1 key1 $char10.
@44 key2 $char7.
@296 date mmddyy10.
@305 value 8.
@222 outcome 8.
If you separate your data into sets of directly related variables, you'll
find it easier to find data type errors (the obvious problems). You will
also find less obvious problems in data integrity. Putting sufficient keys
in each dataset makes combining data a SAS MERGE or JOIN problem that SAS
handles very precisely and efficiently.
Repeating groups of variables and comments tend to multiply the number of
variables in a dataset and make data difficult to normalize. Try writing the
INPUT specifications for the first group with a suffix of 1 or 0 in the
variable names, test that, and then copy the specifications for the first
group and change the suffix to 2, 3, ... A SAS macroprogram will generate
specifications for repeating groups automatically, but may prove to be too
complicated for infrequent use. Use PROC TRANSPOSE to normalize repeating
groups of variables.
Second, when I receive a really large and messy system file, I use DBMSCOPY
to view the data and specify field positions, lengths, types, etc. SAS now
offers DBMSCOPY as a separate product. If you have to capture data from
input files, DBMSCOPY helps speed up the process.
From: Carl Kyonka [mailto:Carl.Kyonka@ENBRIDGE.COM]
Sent: Thursday, December 04, 2003 10:39 AM
Subject: Debugging SAS with many variables (newbie)
I have a file with mixed numeric and alphabetic variables. I am slogging my
way through writing an INPUT statement to read these records. I get errors
with a somewhat helpful dump of the input record and the variables as they
stand when an error occurred. My difficulty is that there are lots of
variables and finding the ones I want is awkward. I can use the find command
to locate each variable in the sequence they are listed in the INPUT
statement, but that is time-consuming. This may be well documented, but I do
not know what to call this record dump so I have not found it in on-line
doc. What I would like is an option to sort the variables being dumped
either alphabetically or in the INPUT sequence. Or a tactic to help
debugging the INPUT statement. Thanks for the time, Carl Kyonka