Date: Wed, 18 Aug 2010 23:26:10 -0400
Reply-To: Matt Curcio <matt.curcio.ri@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Matt Curcio <matt.curcio.ri@GMAIL.COM>
Subject: Re: OCR (was RE: BASE SAS CERTIFICATION)
Greetings All,
If you are looking for a decent OCR may I recommend 'pdfocr.' It is freeware for
linux. Pdfocr can take scanned PDFs and produce text based PDFs that one can copy
text from using 'copy/paste.'
https://launchpad.net/~gezakovacs/+archive/pdfocr
While I am at it, may I recommend that you become familiar with Ubuntu.
http://www.ubuntu.com/
You might strongly consider loading it as a dual boot on your Windows computers.
You should not need more than 10gig for the entire OS and more room for 'future'
software than you can shake a stick at.
BTW, I have used pdfocr on a number of occasions. Most all OCR progs claim stats in
the 90+% range but it is that last 1-10% that make editing a pain in the bum. You
may need several students for that task, heck get a dozen. I think they are cheaper
by the dozen, or was that just a movie!?!