LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 10 Nov 2005 19:07:36 -0500
Reply-To:     Arthur Tabachneck <art297@NETSCAPE.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Arthur Tabachneck <art297@NETSCAPE.NET>
Subject:      Re: about data quality
Comments: To: sas-l@uga.edu

Smartie,

You've already gotten quite a bit of good advice and, not knowing whether you are working as an analyst, researcher, programmer or administrator, I'm not sure if my additional two cents worth will add anything.

I work in the insurance industry. Our data, and how it is processed, is extremely well defined and documented. Those who collect the data are trained via that documentation regarding what to enter, when, and when to submit the information.

However, the data entry staff don't always perfectly conform to the rules, thus we maintain a vast series of error-checking routines which mirror the documentation and return any submitted data that doesn't conform to the stated rules.

But is identifying, documenting and cross-checking all of the rules sufficient? No! Useful, yes, maybe even essential if you really want or need accurate information. But, only by knowing your data can you eventually get to a point of knowing what more you need to look at. In my own work, we have been able to go beyond the rules once we attained a decent understanding of the data. Since the data I work with constitutes annual information which we have been collecting for about the past thirty years or so, we have been able to identify unexplainable year-to-year changes which, simply, didn't appear to make sense.

What I'm getting at is that data providers are people too and exhibit people-like behavior, sometimes behavior that conflicts with the notion of quality data. Yes, they will (given all of the rules and checks) reliabily enter data which meets the rules. Not necessarily valid data, but extremely consistent. A seventeen character Vehicle Information Number (VIN) is a good example. It, too, has all kinds of documented rules, but if a data entry person can't complete their task without entering a valid number, and don't have the real number available, they might just submit their own VIN. Hey, meets all of the rules, won't get bounced back as being an error, but is clearly invalid data.

As such, we look for indefensible year-to-year, company-to-company, season-to-season, etc., changes in ALL of our data. I consider such continual studies essential to improving data quality, but would be the first to agree that we still aren't capturing all errors.

Art -------------- <smartie_zhuo@hotmail.com> wrote in message news:1131653188.751377.75790@g44g2000cwa.googlegroups.com... > Hi, > Recently,I made some mistakes about data quality,some times because I > did not understand data well,some times because I did not know SAS > well,some times because I was careless.So could anyone tell me your > experience how to control data quality. > Thanks in advance >


Back to: Top of message | Previous page | Main SAS-L page