LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (August 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 18 Aug 2010 19:07:19 -0400
Reply-To:     Nat Wooding <nathani@VERIZON.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Nat Wooding <nathani@VERIZON.NET>
Subject:      Re: OCR (was RE: BASE SAS CERTIFICATION)
In-Reply-To:  <>
Content-Type: text/plain; charset="US-ASCII"


My experience with OCR is now a bit old but I can say that success will depend on the quality of the images and whether there is smearing of the lettering during the process. The poorer quality the text, the poorer the result.

One thing that you might do in checking for spelling errors would be to use SAS' Proc Spell. Barbara Okerson had presented a couple papers on obsolete SAS procs and Spell is included. She shows how to create your own word lists in case there are valid spellings that are not found in the SAS list. One of her sets of slides can be found at

Nat Wooding

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Viel, Kevin Sent: Wednesday, August 18, 2010 5:10 PM To: SAS-L@LISTSERV.UGA.EDU Subject: OCR (was RE: BASE SAS CERTIFICATION)

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Michael Raithel > Sent: Wednesday, August 18, 2010 1:09 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: BASE SAS CERTIFICATION > > Dear SAS-L-ers, > > Mike Zdeb posted the following: > > > hi ... if one wants to go back a bit further to "genesis" for > > a discussion of what happens during a data step ... > > > > "The SAS Supervisor" > > Don Henderson & Merry Rabb > > > > > > > > (PDF created by scanning my copy of the NESUG '88 proceedings, > > so it's just an image and cannot be searched for text)

One can use an OCR program to enable this. I used Adobe, but one can find freeware. I would love to hear comments about experience with this using SAS to parse text. Our "eHR" consists of PDF-style documents. We have M4's and nurses, among other highly skilled staff, parsing this manually after searching for the respective file case-by-case :(

I think I am going to have a student or two tackle this, but the question is HOW to get an electronic copy the preserves the form of the original, especially when the document is only rendered and not *stored* as a PDF-like file.

IS said they might be able to provide me with the print spool resulting from their query.

Any comments are welcome. Wine is also warmly welcome.


Kevin Viel, PhD Senior Research Statistician Patient Safety & Quality International College of Robotic Surgery Saint Joseph's Translational Research Institute

Saint Joseph's Hospital 5671 Peachtree Dunwoody Road, NE, Suite 330 Atlanta, GA 30342

(678) 843-6076: Direct Phone (678) 843-6153: Facsimile (404) 558-1364: Mobile Confidentiality Notice: This e-mail, including any attachments is the property of Catholic Health East and is intended for the sole use of the intended recipient(s). It may contain information that is privileged and confidential. Any unauthorized review, use, disclosure, or distribution is prohibited. If you are not the intended recipient, please delete this message, and reply to the sender regarding the error in a separate email.

Back to: Top of message | Previous page | Main SAS-L page