|
Cates, Randall C wrote:
>
> To all SAS-Landers;
>
> I am working on a presentation for my local user group (GAUSS)
> about good SAS coding practices and thought I'd ask here for some new
> ideas. Here are some of my preferences so far:
>
> Indenting within datasteps, procs, sql statements and macros.
> one colon(?) per line.
> Create a good, consistent header for each program with:
> program name,
> author,
> description,
> expected results(?),
> modifications with dates.
> Use "Run;" statements and extra lines to separate datasteps,
> procs, etc. and "Quit;" statements for proc sql statements.
> Good descriptive (but brief) comments between datasteps where
> necessary.
> K.I.S.S.(Keep It Simple SAS). Others are going to be managing
> your SAS code later. Don't try for the ultimate in complex SAS code.
> For example, preferably use single or double dimension arrays and do-end
> blocks rather than triple or greater.
>
> Any other ideas?
>
Randy,
I am glad to see you raise this issue. I fully agree with your
comments above and would like to offer some of my own:
(1) Indent code so as to reflect logic - for example:
if a gt b then
do;
x = uniform (0);
if x**2 le 0.5 then
hits + 1;
else
miss + 1;
end;
(2) Make do-end blocks obvious (as above) rather than aligning them
with the "if" instruction or having them appear at the end of lines
(such as after "then" or the last instruction of the do-end block)
(3) One of the great powers of SAS is the functions ... use them! I
have seen people write lengthy and laborious code to determine the
number of days in a month or year, for example, when the INTNX function
could have been used within a single-line instruction.
(4) The SAS procedures are there to be used. Why have a data step and
a counter, for example, when a PROC MEANS or FREQ could be used?
(5) Formats are very compact and powerful. If there is a need to
recode data, consider formats as a possibility (depending on the amount
of recoding).
(6) If evaluating multiple conditions of a single variable, consider
using a SELECT clause rather than muliple if-then-else statements.
(7) On a related issue, if conditions are indeed mutually exclusive,
do use if-then-else statements (or a select clause). If I could have a
dollar for each time I have seen simple IF statements when if-then-else
was warranted.
(8) Use logical/business names. It is very easy to name datasets
TEMP1, TEMP2, JUNK etc etc ... in fact I have seen people rely on
default SAS data set names! Ditto for variables.
(9) Do not use steps which rely or assume some particular condition
(example below will process the most recently referenced data set ... if
someone where to add a new step right before the PROC SORT step, we
could have a bug introduced)
proc sort;
by var1 var2;
run;
(10) In SCL, for example, try and color-code text. It makes for very
easy tracing when browsing/scrolling pages of SCL ... I code portions of
SAS/BASE in cyan, function calls in orange, constants (strings and
numbers in white), comments in pink and labels and link statements in
green. My "text" color is yellow and background is black. It makes for a
very nice "interface" ....
(11) In SCL, the "entry" and length statements should be commented ..
entry view $001 /* Edit or Browse */
libref $008 /* remote lib ref */
dsn $008 /* data set to open */
hidelist $200; /* var list to suppress in data tbl */
(12) Align redundant pieces of information such as data types,
lengths, comments etc as it makes for easier reading ... in lieu of
entry view $1 /* Edit or Browse */
libref $8 /* remote lib ref */
dsn $8 /* data set to open */
hidelist $200; /* var list to suppress in data tbl */
(13) Use PROC DATASETS to drop intermediate and obsolete data sets
from SASWORK ... it also reinforces what is and what is not important as
you traverse a program's listing.
(14) Add a "Change History" to the program/entry documentation at
the top of the source so that bugs can be identified much easier and a
chronology is also maintained. If a user calls saying that the
programwas fine last week but not this week, I just go to the change
history and look for change(s) during the past few days.
(15) On a related note, document changes in the code as well. For
example,
/* if a gt b then */ /* AA - Nov 4/97 */
if a ge b then /* AA - Nov 4/97 */
(16) RUN and QUIT statements as you mention are almost VITAL as they
make code easier to read and define "logical" break points
(17) Minimize hard-coding! When I see it, I go nuts!
array nums _numeric_;
index = dim (nums);
do i = 1 to index;
.......
end;
or
array nums _numeric_;
do i=1 to 17;
.......
end;
(18) Do not add unnecessary processing ... using the previous
example, do NOT use
do i=1 to dim(nums);
Again, if I could have a dollar for each such similar
carelessness ....
(19) Use DROP and KEEP statements. Using the previous example, I
seriously doubt that I would want the index variable "i" in my data set
at the conclusion of the data step.
I think I will let others add items of their own .... comments on
the above are welcome.
Anthony.
|