LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2005, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 19 Dec 2005 09:31:13 -0600
Reply-To:     Kevin Myers <KMyers@PROCOMINC.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Kevin Myers <KMyers@PROCOMINC.NET>
Subject:      Re: Simple URL Filename Access Problem
Content-Type: text/plain; charset="iso-8859-1"

"David L Cassell" <davidlcassell@msn.com> wrote:

> KMyers@PROCOMINC.NET replied: > >It turns out that the experimental fix mentioned in > >http://support.sas.com/techsup/unotes/SN/011/011102.html is exactly what it > >takes to fix the problem discussed in this thread. Furthermore, this > >problem appears likely to occur when accessing data via URL on any web > >servers that have been upgraded to Windows 2003. Apparantly this issue is > >corrected in SAS 9.1.2. It seems like a fix with this much potential > >impact > >should be upgraded to an officially supported hotfix... > > > >Thanks very much to George Fernandez for providing additional information > >regarding this issue!!! > > Bad news, Kevin. > > There are other things which can make your attempts with the URL engine of > the FILENAME statement go astray. This is not the most robust tool SAS has > built. It may not do all the IE/FireFox/Nyetscape/... tricks of adjusting > the url if > needed. It will not automatically handle ports for you if needed. It > probably > will not handle a page which has dynamic programming. It may not be able to > get past a robot-rejector. It may not handle a proxy server properly. > > You may be a lot better off using a tool like curl in a pipe, so you can get > the > text fed into a data step.. or at least read off the errors that get spit > back. > I like using Perl, preferably with something like the LWP::Simple module to > handle simple stuff, or one of a dozen other modules for further trickiness. > But you knew I was going to say the word 'Perl' in there somewhere... > Yes, I have used Perl for this kind of thing before. But using Perl is like pulling teeth for me. I find the structure and syntax of that language completely arcane. It is *SO* different from everything else that I have ever used. I use it very infrequently, and each time that I do it is almost like a totally new learning experience from scratch.

After working through yesterday's difficulties I am much farther along in my HTTP learning curve. Garth Helf's paper was a big help once I finally came across it. I may end up using curl as you suggested, but am also considering SAS macros based on the socket access method similar to that in Garth's paper.

It seems to me that the URL access method could be greatly improved by providing a mechanism to support the use of cookies, possibly by storing them in macro variables. For example, one might extend the filename statement similar to the following:

filename myFile url 'http://myURL' cookieVar=myCookie;

The above statement would use the contents of macro variable myCookie (if non-blank) to generate a Cookie: record in the HTTP request header. Then the contents of this same macro variable would be updated based on the value of any Set-Cookie: record in the response header (or set to blank if no Set-Cookie: record is returned). The user could of course alter the contents of the macro variable, if desired, between individual filename statements, and could also specify the use of a different macro variable for different filename statements.

My knowledge of cookies is pretty limited at this time, so there might be some reason that the above handling would be inadequate. But FWIW, I do know that something along these lines would work for the scenarios given in Garth's paper and for the web site that I am presently working with.

Another extremely useful enhancement would be to support the POST method, probably through additional filename statement options. If specified, this option would use the POST method rather than the GET method to request URL content. The user would also be allowed to specify a macro variable (or even a file?) containing data for the content portion of the POST request. For example, the user might specify:

%let myPostContent=j_username=helf&j_password=notmypw&Logon=Log+On; /* from Garth's paper */ filename myFile url 'http://myPostURL' method=POST cookieVar=myCookie contentVar=myPostContent;

With these two enhancements, SAS could handle *ALL* of the web pages from which I have ever attempted to extract data content. I know there are more sophisticated techniques that some web pages use in an attempt to defeat bots, but so far I have never had the need to try to get around such extreme measures, and I don't believe most other SAS users would need to either. It seems to me that the above enhancements would far exceed the 80/20 rule regarding URL data extraction needs for most SAS users, whereas the existing filename URL capabilities are probably adequate much less than half the time.

So, what do you think about these suggestions?

Regards, s/KAM


Back to: Top of message | Previous page | Main SAS-L page