Date: Thu, 10 Nov 2005 19:07:36 -0500
Reply-To: Arthur Tabachneck <art297@NETSCAPE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@NETSCAPE.NET>
Subject: Re: about data quality
You've already gotten quite a bit of good advice and, not knowing whether
you are working as an analyst, researcher, programmer or administrator, I'm
not sure if my additional two cents worth will add anything.
I work in the insurance industry. Our data, and how it is processed, is
extremely well defined and documented. Those who collect the data are
trained via that documentation regarding what to enter, when, and when to
submit the information.
However, the data entry staff don't always perfectly conform to the rules,
thus we maintain a vast series of error-checking routines which mirror the
documentation and return any submitted data that doesn't conform to the
But is identifying, documenting and cross-checking all of the rules
sufficient? No! Useful, yes, maybe even essential if you really want or
need accurate information. But, only by knowing your data can you
eventually get to a point of knowing what more you need to look at. In my
own work, we have been able to go beyond the rules once we attained a decent
understanding of the data. Since the data I work with constitutes annual
information which we have been collecting for about the past thirty years or
so, we have been able to identify unexplainable year-to-year changes which,
simply, didn't appear to make sense.
What I'm getting at is that data providers are people too and exhibit
people-like behavior, sometimes behavior that conflicts with the notion of
quality data. Yes, they will (given all of the rules and checks) reliabily
enter data which meets the rules. Not necessarily valid data, but extremely
consistent. A seventeen character Vehicle Information Number (VIN) is a
good example. It, too, has all kinds of documented rules, but if a data
entry person can't complete their task without entering a valid number, and
don't have the real number available, they might just submit their own VIN.
Hey, meets all of the rules, won't get bounced back as being an error, but
is clearly invalid data.
As such, we look for indefensible year-to-year, company-to-company,
season-to-season, etc., changes in ALL of our data. I consider such
continual studies essential to improving data quality, but would be the
first to agree that we still aren't capturing all errors.
<firstname.lastname@example.org> wrote in message
> Recently,I made some mistakes about data quality,some times because I
> did not understand data well,some times because I did not know SAS
> well,some times because I was careless.So could anyone tell me your
> experience how to control data quality.
> Thanks in advance