How
to read raw web log server files?
by Zac Hewlett
First things first, if your web
hosting service provider doesn't allow you to download the raw log
file, just go away. You need the raw log file to study and improve
your web site performance.
What's A Log File?
Every time someone visits a page on your site, a
record is made into the log file, which is saved on your server.
You can find some interesting and useful information about the visitors
in the log file.
Though log file formats vary, here I discuss the
common elements.
Here's the contents of a single line of the log
file from this site.
165.21.154.9 - - [03/Jul/2003:06:39:23 -0400] "GET
/ HTTP/1.1" 200 15549 "http://www.working-at-home-business.com"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; http://www.working-at-home-business.com)"
Let´s see what's inside one by one.
User IP Address
210.25.164.20
This is the IP address of the visitor to our site.
It tells you where the visitor is from. If you do a reverse DNS
look-up on this IP number at DNS Stuff the result is bbcache-9.singnet.com.sg
which belongs to "Singapore Telecommunications Pte Ltd".
You really can't go further than that to identify a particular person.
Otherwise, the Internet would be too dangerous. ;)
Yeah, that visitor was myself.
Date/Time
03/Jul/2003:06:39:23
The exact time of the visit. Combined with the IP
address, it enables you to follow a particular visitor sequentially
from page to page on your site. More on this later.
GMT offset
0400
This is the number of hours from Greenwich Mean
Time (GMT). So in our example the offset is 4 hours from GMT.
Action
"GET / HTTP/1.1"
This is either GET or POST. Except for a few CGI
programs, this will typically be GET. That is, get a web page or
an image that goes on that page.
This line records a command from my own browser
to GET a web page from the root directory (Notice the slash "/"after
GET) using a protocol named HTTP/1.1. This is the index page of
our web site.
Another example.
"GET /web-promotion/index.shtml HTTP/1.1"
It records a request of this URL:
http://www.webhosting.com/web-promotion/index.shtml
Return Code
200
The next item tells whether the action was successful
or not. Our example is a return code of 200, which means "Successful
Loaded". You've probably got the dreaded 404 "File Not
Found" error code when the web page you were trying to find
wasn't at that URL, so these return codes aren't entirely new to
you.
Other common return codes include:
400 - Bad Request 401- Authorization Required 403
- Forbidden 500 - Internal Server Error
Size
15549
This is the size of the file sent, in this case
15549 bytes.
Referrer
"http://www.scrup.com"
This tells us the web page where the visitor came
from. In our example http://www.working-at-home-business.com ,which
is also run by us.
You will find another extremely important piece
of information here: the keywords by which your visitors found you.
For example:
"http://search.msn.com/bin/search?p=web%20hosting%20singapore"
It tells you someone found this page at Yahoo, using
the keywords "web hosting singapore".
By studying referrer information, you will know
exactly which search engine brings your how many visitors, what
they were looking for when they found your site, which links partner
is more valuable...then you will know how to spend your advertising
dollars wisely.
Browser/Platform
"Mozilla/4.0 (compatible; MSIE 6.0; Windows
NT 5.0; http://www.working-at-home-business.com)"
The final field in the log file tells you what web
browser and operating system the visitor is using.
Mozilla is a code name that indicates the browser
is Netscape-compatible. In this case, the visitor was using IE6.0
on a Windows NT operating system.
Why my URL is at the end again? Just a little fun.
I customized my browser a bit. You won't see it anywhere else unless
I did visit your web site. ;)
Tracing A Visitor
Here comes the more interesting part. Lets take
a closer look at the log file and see how a visitor passes through
your site. I will be abbreviating the log file to simplify this
for you.
03/Jul/2003:06:39:23 GET / 03/Jul/2003:06:39:52
GET /web-hosting/index.shtml 03/Jul/2003:06:40:36 GET /newsletter/index.shtml
03/Jul/2003:06:41:04 POST /cgi-bin/followup/auto_followup.pl 03/Jul/2003:06:41:05
GET /newsletter/subscribed.shtml 03/Jul/2003:06:41:27 GET /support/index.shtml
03/Jul/2003:06:41:38 GET /support/log.shtml
First, the visitor went to the homepage of Singapore
Web Hosting, then web hosting section to find out more about web
hosting package. And then looked at newsletter page and filled up
the subscription form. Our CGI script processed the form and the
visitor was redirected to "thank you for subscription"
page. This visitor continued reading some articles in support section.
I've skipped all the requests for images.
Why should you analyze a visitor's path? Because
only when you do that, you begin to discover how a visitor uses
your site: which door and from where she comes in, what interests
her most, and where she leaves. Lots of small scientific observations
will add up to an accurate picture of what a visitor actually does
on your site.
That information is priceless if your goal is to
optimize the experience, and lead your visitor to the most important
parts of your site.
|