LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2009, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 31 Aug 2009 11:30:38 -0500
Reply-To:     matt.pettis@THOMSONREUTERS.COM
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Matthew Pettis <matt.pettis@THOMSONREUTERS.COM>
Subject:      Re: Reading a PDF File
In-Reply-To:  A<966B4B225F74914599E617416969BF1A4384CD8953@DOM-MBX02.mbu.ad.dominionnet.com>
Content-Type: text/plain; charset="us-ascii"

I've used to success the pdftotext free command line utility. It will do as Nat says you need to do: convert a pdf to a text file and parse it. Here is some info on it:

http://en.wikipedia.org/wiki/Pdftotext

here's where you can download it: http://www.foolabs.com/xpdf/download.html

For a sane output format of the text, I recommend including the '-layout' switch on the command line.

Hopefully, the Acrobat 'save as text' option works well for you, but if not, this might be a good backup plan. Either case may require some output file massaging done manually.

HTH, Matt

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Nathaniel Wooding Sent: Monday, August 31, 2009 10:57 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Reading a PDF File

Roger

You will need to convert the file to a TXT file and then parse the data. Hopefully, it will have a simple standard layout.

Acrobat Reader 7 has a save as text feature. I do not know whether this is available on Linux but you should be able to do the translation on a Windows box and then move it to LINUX.

I have a couple versions of a paper posted on the web but these deal with reading a lot of pdfs where doing a simple open and save as were not practical.

The big issue for you will be dealing with the file layout. Depending on how long the file is, some manual editing may simplify the parsing process.

Nat Wooding

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of NOMAIL Roger S. Clark Sent: Monday, August 31, 2009 11:49 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Reading a PDF File

Hi, SAS-L Group;

I am programming Independent Verification and Validation of a product that my division will deliver to an internal customer.

I have just learned (about 45 minutes ago) that one of the files I will need to read into SAS is a file with a .pdf extension.

I've found considerable information in the online documentation for creating pdf output, but nothing regarding using a pdf file as input.

Is it possible? If so, could someone advise how it is done?

This program is in the planning stage, so I have no code developed to include in the E-mail.

My program will be running SAS 9.1.3 SP4 in a Red Hat LINUX system.

Thanx, Roger S. Clark Address Products Management Branch 763-9177 4H584U CONFIDENTIALITY NOTICE: This electronic message contains information which may be legally confidential and or privileged and does not in any case represent a firm ENERGY COMMODITY bid or offer relating thereto which binds the sender without an additional express written confirmation to that effect. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.


Back to: Top of message | Previous page | Main SAS-L page