Date: Tue, 31 Mar 2009 23:33:27 +0530
Reply-To: Ajay ohri <ohri2007@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ajay ohri <ohri2007@GMAIL.COM>
Subject: Re: Reading Web logs with SAS
In-Reply-To: <da295e9f-a752-441a-ab39-86039e475374@p6g2000pre.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
it depends on the volume of web logs to be parsed. if volume is high,
yes SAS can be tweaked to read web logs in a certain way. especially
since many of them use similar formats ( wordpress , blogger,type pad)
are three.
those formats can be checked using the css,.php and theme editor of
the html. tweaking your code for parising is most painful as you need
to tweak the point at which post begins or ends .
while perl is considered stadard, try using a browser macro using a
language developed at www.iopus.com it records macro while browsing
same as excel macro records vba.
once you have compiled your main browser file in the .iim format you
can use SAS (or a normal excel VBA macro) to open Imacro application
, run the .iim file ,download in the standard location ,
and you can use google desktop for searching the huge volume of text
files downloaded. uses the same algorithm of google ;)
www.decisionstats.com
Rodney Dangerfield - "I haven't spoken to my wife in years. I didn't
want to interrupt her."
On Tue, Mar 31, 2009 at 10:07 AM, Savian <savian.net@gmail.com> wrote:
> On Mar 30, 4:35 pm, yamira...@YAHOO.COM (Richard Whitehead) wrote:
>> someone on another forum posted that web logs can be read directly with
>> proc import. is this true? btw, i am in a brief sas-less period, so i
>> can't actually check for myself. :-) anyway, regardless, of the answer to
>> the above, is there an easy way, i.e. not having to write code in a data
>> step, to read web logs with sas?
>>
>> thanks in advance,
>>
>> richard whitehead
>
> I am unsure if I hit the wrong button or what with my posting on this
> issue. Anyway, somewhat of a reprise (I hope it doesn't appear twice):
>
> In a short answer, no, proc import won't get you anywhere close to
> what you want.Reading web logs is bad enough, analyzing them is a
> nightmare. But let's skip the gory details for now.
>
> 1. Find a program on the web that does this for you already: don't
> reinvent the wheel.
>
> ....... really, see # 1....
>
> 1. Ok, if you decide to use SAS (and not their web analytic product),
> skip the SAS functions and use regular expressions. I wish I knew more
> about regex when I dealt with trillions of bytes of these records from
> dozens of companies.
> 2. Keep in mind that 90% of the records are useless info. Throw them
> away immediately. If you have very high volume, use Perl for pre-
> processing.
> 3. Web logs can vary in columns used, layout within a single file, and
> layout from web server to web server. I have even seen embedded EOF
> markers in a log.
> 4. Analyzing them is plain hard and is fraught with error. There are
> so many things on the web that cause inaccuracy that take anything you
> see with a large grain of salt.Actually, make that a salt block...
> 5. Know thy enemy and narrow scope. Decide if you need to read from
> multiple web server architectures or just one. Is the layout fixed for
> what you have to do or can it vary.
> 6. As I tell people, weblogs are the second hardest datasource I have
> ever dealt with (CDRs being #1).
>
>
> Alan
> Savian
>
|