Date: Mon, 28 Nov 2005 09:28:56 -0500
Reply-To: "Fehd, Ronald J" <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Fehd, Ronald J" <rjf2@CDC.GOV>
Subject: Re: Data Cleaning Books
Content-Type: text/plain; charset="us-ascii"
> From: toby dunn
>
> Does anyone have any favorite data cleaning or Data Quality
> Management books
> other than Ron Cody's book that they would like to recommend?
> I think I have started going way beyond Ron's book.
Toby:
What is the scope of your Questions?
* how to identify stuff?
* what to do with this stuff?
* how to update the stuff in our data sets?
In my own work I resolved 80% of my interminable questions
by having
* the data collection form
* the data dictionary
* and a freq of all variables
see the quote, which is the summary
or head-slap
of my decade of data cleansing.
Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov
Your task is simple: remove the difference
between how things should be
and how they really are.
-- Ashleigh Brilliant pot-shot #4247
got user-defined formats?
then 80% of -your- job is done.
80% of -somebody- else's job is to review the reports.
%INVALID: a data review macro
using proc FORMAT option other=INVALID to identify and list outliers
http://www.pace.edu/nesug/proceedings/nesug01/at/At1008.pdf
PharmaSUG 2004
http://www.lexjansen.com/pharmasug/2004/DataManagement/DM06.pdf