LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 5 Jan 2010 19:07:10 -0500
Reply-To:     Francois van der Walt <francoisw@GJI.COM.AU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Francois van der Walt <francoisw@GJI.COM.AU>
Subject:      Re: Data Validation/Cleansing Tool Query
Comments: To: Jonathan Goldberg <jgoldberg@BIOMEDSYS.COM>

Dear Jonathan and SAS-L (ers)

Data cleaning is certainly important and the value of clean data is often underestimated. Interestingly it is for us (GJI) often the easiest service to sell.

The biggest bang for buck that we use in the data cleaning process and that I can recommend as an excellent starting point is Characterise under Enterprise Guide. I am sure it use to be macro's developed and some SAS-L ers will be able to refer you to it. (If you do not have Enterprise Guide available let me know and I will provide you with an extract of the macro's)

Characterise provides a frequency analysis for all alpha fields (top 30 by default) that we use to quickly identify problems like blank fields or lots of "N/A", "TBA", "TEST", "HJKL" etc in fields. We ask business owners to identify the valid versus invalid values in a extracted spreadsheet. We also use it to generate a translation table that for example translate the Australian state "Victoria", "VIC.", "V.I.C." etc to a consistent "VIC".

For numeric fields Characterise provides number of missings averages, maximums, minimums etc.

Kind Regards Francois (Brisbane)

On Tue, 5 Jan 2010 12:36:14 -0500, Jonathan Goldberg <jgoldberg@BIOMEDSYS.COM> wrote:

>Well, I can't complain that I didn't get responses to my query. True, >most of them were a bit snippy... > >The messages mostly were about how big an investment a piledriver is when >all I'm looking for is a hammer. Our situation is relatively simple; we >don't have compliciated normalization schemes or most other possible >complications. > >The idea of a tool is to put data cleaning as much as possible *in the >hands and under the control of the people who know the data*. Who are >also the people who will deal with any problems found. The need to >involve programmers slows down the projects and increases costs. Also, >there is no way whatsoever that a programmer could do data cleaning on >his/her own. I think trying to streamline that process is a quite >reasonable thing to do. > >Besides, my management wants me to do it, with something we write >ourselves (which will involve non-SAS programming for the front end) or >with a third party product. So, reasonable or not, here we go. > >My thanks for the pointers to Dataflux, SDD (no, I'm not familiar with all >the SI vertical-market products), and SAS Data Quality Solution. > >Michael, I hope that when you finish your evaluation you will post your >conclusions here. Inquiring minds whan to know! And thanks again for the >screen cleaner. > >Jonathan > >ps >It's true my cubicle is somewhat messy... :-) > >pps. >"(believe it or not SAS programmers are not necessarily the highest paid >employees in some organizations)." > >In my case I guarentee it.


Back to: Top of message | Previous page | Main SAS-L page