Date:         Wed, 18 Aug 2010 19:07:19 -0400
From:         Nat Wooding
Subject:      Re: OCR (was RE: BASE SAS CERTIFICATION)
From:         Nat Wooding <nathani@VERIZON.NET>
Subject:      Re: OCR (was RE: BASE SAS CERTIFICATION)
My experience with OCR is now a bit old but I can say that success will depend on the quality of the images and whether there is smearing of the lettering during the process. The poorer quality the text, the poorer the result.

One thing that you might do in checking for spelling errors would be to use SAS' Proc Spell. Barbara Okerson had presented a couple papers on obsolete SAS procs and Spell is included. She shows how to create your own word lists in case there are valid spellings that are not found in the SAS list. One of her sets of slides can be found at

Nat Wooding

From: Kevin Viel
Sent: Wednesday, August 18, 2010 5:10 PM
Subject: OCR (was RE: BASE SAS CERTIFICATION)

From: Michael Raithel
Sent: Wednesday, August 18, 2010 1:09 PM
Subject: Re: BASE SAS CERTIFICATION

> hi ... if one wants to go back a bit further to "genesis" for
> a discussion of what happens during a data step ...
>
> "The SAS Supervisor"
> Don Henderson & Merry Rabb
>
>
>
> (PDF created by scanning my copy of the NESUG '88 proceedings,
> so it's just an image and cannot be searched for text)

One can use an OCR program to enable this. I used Adobe, but one can find freeware. I would love to hear comments about experience with this using SAS to parse text. Our "eHR" consists of PDF-style documents. We have M4's and nurses, among other highly skilled staff, parsing this manually after searching for the respective file case-by-case :(

I think I am going to have a student or two tackle this, but the question is HOW to get an electronic copy the preserves the form of the original, especially when the document is only rendered and not *stored* as a PDF-like file.

IS said they might be able to provide me with the print spool resulting from their query.

Any comments are welcome. Wine is also warmly welcome.


