LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2001, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 24 Jan 2001 08:04:41 -0800
Reply-To:   Nick Paszty <npaszty@ORGANIC.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Nick Paszty <npaszty@ORGANIC.COM>
Subject:   Re: How to process a HTML source code file in sas
Comments:   To: duckchai <duckchai@NETVIGATOR.COM>
In-Reply-To:   <94mgof$qff4@imsp212.netvigator.com>
Content-Type:   text/plain; charset="us-ascii"; format=flowed

Hello Isaac.

Yes you could do this if you are using V8.x. The character length limitation in V6.x would make it less likely to succeed. What you want to do is read the html file into SAS using the truncover option on the infile statement since the records in your input file are varying length. You could do something like this

* parses each line into its constituent parts; data test_html; infile "path\filename" truncover; input htmlline $500.; * grab entire line;

if (index(htmlline,'<head>')>0) then do; something; end;

run;

Using the index function, you could find records with certain text strings. I'm not sure what you mean by cleansing though - re-writing HTML? I think that would be tedious with SAS.

Hope this helps,

Nick

At 08:09 PM 1/21/01 +0800, duckchai wrote: >Hi, >I am having some task about data cleansing of HTML source code, i.e. to >extract some specific string from a text file containing HTML source. I >wonder if: > >1. It is possible to input a text file with contain, like homepage's source >code, into sas? >2. It is possible to maniupulate the text file in DATA step to perform task >like cleansing, e.g using substr(), index() or index().....etc? > >Thx > >Yours > >Isaac


Back to: Top of message | Previous page | Main SAS-L page