What are log files?
When someone visits your website they are making requests on your web server.Â Â The details of these requests are recorded in your web server log files.Â Most websites in the World run on Apache web servers and thatâ€™s what Iâ€™m going to be talking about in this article.Â If your website is using another type of web server (for example Microsoft IIS), the principles are the same but the commands and tools you need to use to analyse the files will differ.
Types of log files
By default the Apache (HTTPD) web server creates two log files, â€˜acccess_logâ€™ and â€˜error_logâ€™.Â Â If you are not sure where these files are on your server then you will need to look at your server config.Â You should find two lines like these:
ErrorLogÂ Â /var/www/yoursite.com/statistics/logs/error_log
These tell the Apache web server where to write the log files.
What gets logged?
Unless you have specifically reconfigured your web server it will probably be writing log files in the Common Logfile Format (CLF).Â Â Â The log file will contain records like the one below:
220.127.116.11 – – [10/May/2009:17:44:17 +0100] “GET /store/product-images/PC1.jpg HTTP/1.1” 200 16989 “http://www.yoursite.com/store/Page/158/PC1.html” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:18.104.22.168) Gecko/2009032609 Firefox/3.0.8 (.NET CLR 3.5.30729)”
This may look complicated and might put you off going any further but it really is very simple.Â Lets break this down one element at a time.
22.214.171.124 â€“ This is the IP address from where the request on your web server was made.
– You will generally always just see a hyphen in this field.Â It should contain the id of the person making the request but for security reasons this is seldom enabled and would be unreliable if present as it would be easily spoofed by hackers.
– You will also generally see a hyphen in this field.Â If your website protects some pages or directories by using HTTP authentication you will see the user id of the person making the request in this field, once they have been authenticated.
[10/May/2009:17:44:17 +0100] â€“ The date of the request.
“GET /store/product-images/PC1.jpg HTTP/1.1” â€“ the request made by the web browser or search engine.
200 â€“ the HTTP status code the server gave in response to the request.Â This is explained in more detail below.
6989 â€“ the number of bytes sent back to the person who made the request.
“http://www.yoursite.com/store/Page/158/PC1.html” â€“ the referring site or page.
“Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:126.96.36.199) Gecko/2009032609 Firefox/3.0.8 (.NET CLR 3.5.30729)” â€“ the browser/ search engine bot / platform string.
What to look out for in log files.
When I look at log files I typically start by looking for requests, which have resulted in a 404 response.Â This means that someone has requested a page (or resource such as a graphic), which doesnâ€™t exist.Â If you find this happening then look at the referring site or page.Â If the referrer is you own site then you need to get this broken link fixed.Â If the request is coming from another site you might need to contact the sites owner and ask them to update the link.Â In the real world getting a site owner to update a link can be quite difficult, therefore if the error is happening a lot consider setting up a 301 â€œPermanently Movedâ€ redirect.
Once youâ€™ve addressed any 404 errors, you should then review any requests, which have resulted in a 500 response.Â Status 500 indicates a runtime error on your web server typically when generating a dynamic CGI (PHP, JSP, ASP etc) type web page.Â Look at the time this error happened and then look around that time in the error_log file for more information.
In a future article I will look in more detail at other status codes and some of the tools you can use to explorer you log files.
If you have any tips for great ways to use log files or questions please leave a comment below.