|
Hello Isaac.
Yes you could do this if you are using V8.x. The character length
limitation in V6.x would make it less likely to succeed. What you want to
do is read the html file into SAS using the truncover option on the infile
statement since the records in your input file are varying length. You
could do something like this
* parses each line into its constituent parts;
data test_html;
infile "path\filename" truncover;
input htmlline $500.; * grab entire line;
if (index(htmlline,'<head>')>0) then
do;
something;
end;
run;
Using the index function, you could find records with certain text
strings. I'm not sure what you mean by cleansing though - re-writing
HTML? I think that would be tedious with SAS.
Hope this helps,
Nick
At 08:09 PM 1/21/01 +0800, duckchai wrote:
>Hi,
>I am having some task about data cleansing of HTML source code, i.e. to
>extract some specific string from a text file containing HTML source. I
>wonder if:
>
>1. It is possible to input a text file with contain, like homepage's source
>code, into sas?
>2. It is possible to maniupulate the text file in DATA step to perform task
>like cleansing, e.g using substr(), index() or index().....etc?
>
>Thx
>
>Yours
>
>Isaac
|