Date: Tue, 17 Aug 2004 12:19:58 -0400
Reply-To: harbourcharles@JOHNDEERE.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Charles Harbour <harbourcharles@JOHNDEERE.COM>
Subject: Re: Open Discussion: To what extent do you implement error
checking/handling?
Content-Type: multipart/mixed;
Well of course it depends! Here's some of my criteria for determining
whether it depends or not (and the various levels you speak of):
1) For data quality, What is the impact of not cleansing your data? Will
it simply mean a typo that's easily understood and corrected? Or, do you
use this particular variable for more downstream processing/calculations,
such that if there's an error at the source, it will multiply upon
aggregation? How important are those aggregate numbers?
2) For data processing, How important is clean data to the downstream
processing? If your customers don't mind having bad data, then what's the
big deal? OTOH, if it's imperative that your customers have correct data
(thinking of say, a credit analysis, that will determine if your customer
is approved for a car or home loan), then perhaps you should stop the
process (or at least kick the questionable record out of processing for
examination later) until the data can be cleansed. This is somewhat
dependent on whether you're performing batch or online processing--no point
in holding up an entire batch for one bad record (just throw it off to the
side for someone to look at later), nor holding up an entire online process
because the current record is bad.
From personal experience, the ratio of up front time spent in cleansing
data is roughly 1 to 10, when compared with the amount of time spent
cleaning data on the back end. It will save you many, many headaches to
get your data clean as close to the source as possible, where you can make
more intelligent inferences about how to fix your data, and not wait until
you're several steps removed and wondering how (and with what
qualifications) you will repair your bad apples.
On with the discussion!
CH
On Tue, 17 Aug 2004 10:26:41 -0400, M N <iced_phoenix_news@YAHOO.COM> wrote:
>Dear SAS-L,
>
>I would like to discuss the extent to which each of
>you employs error checking/handling in your code, and
>for what classes of errors. I am particularly
>interested in how you employ error checking in
>general-use macros (i.e. macros that may be used by
>other programmers).
>
>For instance:
>
>* To what extent is the correctness of parameters the
>responsibility of the caller, and to what extent is it
>the responsibility of the macro? If I have a DATA=
>parameter, the macro could:
>
>1.) Simply check that all characters in the variable
>are alpha or a '.'
>2.) Verify that the variable is a valid SAS data set
>name
>3.) Verify that the table actually exists
>4.) Verify that the contents of the table meet the
>specifications of the macro
>5.) Leave one or more of these checks to the caller,
>and let SAS produce an error message when the macro
>tries to read the table in a data step
>
>* To what extent do you check the incoming contents of
>your input data sets (i.e. as each obs is read by SET
>in the data step loop) to match macro/program
>specifications?
>
>* To what extent do you utilize SAS error/return code
>macro vars such as the SYS* family, or the sysmsg()
>function, etc?
>
>* Do any of you use SCL I/O functions rather than the
>normal Base SAS statements to open, read, and process
>data sets so that you get return codes at each
>statement?
>
>* To what extent do you simply let SAS find
>non-application specific errors (such as the
>by-variables in a macro parameter not actually being
>present in the data set) and print Errors/Warnings to
>the log? Do you parse the log for such errors in the
>program itself? Or do it as a separate step after the
>program terminates?
>
>I realize that the answer to all of these questions is
>"it depends", but I'd like to get some idea of what
>other SAS programmers do in various production-quality
>code situations. If you have other error handling
>issues that you'd like to mention outside of the
>questions that I posed, that's great--I'd simply like
>a general discussion of these matters.
>
>Thanks,
>Matt
AdmID:5019F384958F5E1D42D4F05F0C4203B3
|