LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2009, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 31 Mar 2009 13:39:07 -0700
Reply-To:   Savian <savian.net@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Savian <savian.net@GMAIL.COM>
Organization:   http://groups.google.com
Subject:   Re: Reading Web logs with SAS
Comments:   To: sas-l@uga.edu
Content-Type:   text/plain; charset=ISO-8859-1

On Mar 31, 12:03 pm, ohri2...@GMAIL.COM (Ajay ohri) wrote: > it depends on the volume of web logs to be parsed. if volume is high, > yes SAS can be tweaked to read web logs in a certain way. especially > since many of them use similar formats ( wordpress , blogger,type pad) > are three. > > those formats can be checked using the css,.php and theme editor of > the html. tweaking your code for parising is most painful as you need > to tweak the point at which post begins or ends . > > while perl is considered stadard, try using a browser macro using a > language developed atwww.iopus.comit records macro while browsing > same as excel macro records vba. > > once you have compiled your main browser file in the .iim format you > can use SAS (or a normal excel VBA macro) to open Imacro application > , run the .iim file ,download in the standard location , > > and you can use google desktop for searching the huge volume of text > files downloaded. uses the same algorithm of google ;) > > www.decisionstats.com > > Rodney Dangerfield - "I haven't spoken to my wife in years. I didn't > want to interrupt her." > > > > On Tue, Mar 31, 2009 at 10:07 AM, Savian <savian....@gmail.com> wrote: > > On Mar 30, 4:35 pm, yamira...@YAHOO.COM (Richard Whitehead) wrote: > >> someone on another forum posted that web logs can be read directly with > >> proc import. is this true? btw, i am in a brief sas-less period, so i > >> can't actually check for myself. :-) anyway, regardless, of the answer to > >> the above, is there an easy way, i.e. not having to write code in a data > >> step, to read web logs with sas? > > >> thanks in advance, > > >> richard whitehead > > > I am unsure if I hit the wrong button or what with my posting on this > > issue. Anyway, somewhat of a reprise (I hope it doesn't appear twice): > > > In a short answer, no, proc import won't get you anywhere close to > > what you want.Reading web logs is bad enough, analyzing them is a > > nightmare. But let's skip the gory details for now. > > > 1. Find a program on the web that does this for you already: don't > > reinvent the wheel. > > > ....... really, see # 1.... > > > 1. Ok, if you decide to use SAS (and not their web analytic product), > > skip the SAS functions and use regular expressions. I wish I knew more > > about regex when I dealt with trillions of bytes of these records from > > dozens of companies. > > 2. Keep in mind that 90% of the records are useless info. Throw them > > away immediately. If you have very high volume, use Perl for pre- > > processing. > > 3. Web logs can vary in columns used, layout within a single file, and > > layout from web server to web server. I have even seen embedded EOF > > markers in a log. > > 4. Analyzing them is plain hard and is fraught with error. There are > > so many things on the web that cause inaccuracy that take anything you > > see with a large grain of salt.Actually, make that a salt block... > > 5. Know thy enemy and narrow scope. Decide if you need to read from > > multiple web server architectures or just one. Is the layout fixed for > > what you have to do or can it vary. > > 6. As I tell people, weblogs are the second hardest datasource I have > > ever dealt with (CDRs being #1). > > > Alan > > Savian- Hide quoted text - > > - Show quoted text -

Web logs are not the same as web blogs.

Web logs from a web server can be in the TBs/day so simple methods of reading are not useful. 90% of the volume can be tossed so using a low- level language, like Perl, that can get rid of the 90% is what is required.

There are also interdepencies between records so simple search tools do not provide any significant analysis. The records need to be parsed, joined (as best you can since there are no common links between records), then analyzed. This needs to be done in a parallel fashion for large volumes.

Now the hard question is how do you do parallel analysis of sessions if the user session crosses multiple logs? That's where some of the tricks come into play and why I recommended finding an existing parser that handles those issues.

Alan Savian


Back to: Top of message | Previous page | Main SAS-L page