What are Apache web server log files?
When someone visits your website they are making requests on your web server. The details of these requests are recorded in your web server log files. Most websites in the World run on Apache web servers and that’s what I’m going to be talking about in this article. If your website is using another type of web server (for example Microsoft IIS), the principles are the same but the commands and tools you need to use to analyse the files will differ.
Types of Apache log files
By default, the Apache (HTTPD) webserver creates two log files, ‘acccess_log‘ and ‘error_log‘
Apache error log file location on Linux
If you are not sure where these files are on your server then you will need to look at your server config. You should find two lines like these:
These tell the Apache web server where to write the log files.
Apache log file format, what gets logged?
Unless you have specifically reconfigured your web server it will probably be writing log files in the Common Logfile Format (CLF). The log file will contain records like the one below:
18.104.22.168 – – [10/May/2009:17:44:17 +0100] “GET /store/product-images/PC1.jpg HTTP/1.1” 200 16989 “http://www.yoursite.com/store/Page/158/PC1.html” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:22.214.171.124) Gecko/2009032609 Firefox/3.0.8 (.NET CLR 3.5.30729)“
This may look complicated and might put you off going any further but it really is very simple. Let’s break this down one element at a time.
126.96.36.199 This is the IP address from where the request on your web server was made.
– You will generally always just see a hyphen in this field. It should contain the id of the person making the request but for security reasons, this is seldom enabled and would be unreliable if present as it would be easily spoofed by hackers.
– You will also generally see a hyphen in this field. If your website protects some pages or directories by using HTTP authentication you will see the user id of the person making the request in this field, once they have been authenticated.
[10/May/2009:17:44:17 +0100] The date the page was requested. (It’s a good idea to check the time on your server is accurate, with daylight saving and servers being hosted in other countries they can be an hour or two out.)
“GET /store/product-images/PC1.jpg HTTP/1.1” is the request made by the web browser or search engine.
200 the HTTP status code the server gave in response to the request. This is explained in more detail below.
6989 the number of bytes sent back to the person who made the request.
“http://www.yoursite.com/store/Page/158/PC1.html” is the referring site or page.
“Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:188.8.131.52) Gecko/2009032609 Firefox/3.0.8 (.NET CLR 3.5.30729)” the browser/ search engine bot / platform string.
What to look out for in your Apache log files.
When I look at log files I typically start by looking for requests, which have resulted in a 404 response. This means that someone has requested a page (or resource such as a graphic), which doesn’t exist. If you find this happening then look at the referring site or page. If the referrer is your own site then you need to get this broken link fixed. If the request is coming from another site you might need to contact the owner of the site and ask them to update the link. In the real world getting a site owner to update a link can be quite difficult, therefore if the error is happening a lot consider setting up a 301 Permanently Moved redirect.
Once you’ve addressed any 404 errors, you should then review any requests, which have resulted in a 500 response. Status 500 indicates a runtime error on your web server typically when generating a dynamic CGI (PHP, JSP, ASP etc) type web page. Look at the time this error happened and then look around that time in the error_log file for more information.
Apache httpd log analyzer
As you have seen above, the format of the Apache log files is quite simple. You don’t need a specialist or expensive software to process these files. If you know some basic Python coding, you can use the https://pypi.org/project/apachelogs/ module to parse these files. From there you can, for example, extract all the 404’s and put the list into Excel for detailed analysis.
In a future article, I will look in more detail at other status codes and some of the tools you can use to explorer your log files.
If you have any tips for great ways to use log files or questions please leave a comment below.
Alternatives to Google Analytics – Web Analytics and Privacy
These days many people are concerned about using free services, such as Google Analytics. This is because it allows these third parties to track who is visiting your website and also track a person across multiple sites and therefore to profile them. While Google Analytics is a great service, I do find that I can get everything I need from log analysis. If you want your website to have a minimum footprint then removing Google Analytics, Facebook tracking pixels and Twitter buttons and using Log analysis instead is a good idea.