Yes you could do this if you are using V8.x. The character length
limitation in V6.x would make it less likely to succeed. What you want to
do is read the html file into SAS using the truncover option on the infile
statement since the records in your input file are varying length. You
could do something like this
* parses each line into its constituent parts;
infile "path\filename" truncover;
input htmlline $500.; * grab entire line;
if (index(htmlline,'<head>')>0) then
Using the index function, you could find records with certain text
strings. I'm not sure what you mean by cleansing though - re-writing
HTML? I think that would be tedious with SAS.
Hope this helps,
At 08:09 PM 1/21/01 +0800, duckchai wrote:
>I am having some task about data cleansing of HTML source code, i.e. to
>extract some specific string from a text file containing HTML source. I
>1. It is possible to input a text file with contain, like homepage's source
>code, into sas?
>2. It is possible to maniupulate the text file in DATA step to perform task
>like cleansing, e.g using substr(), index() or index().....etc?