Server Log Analytics
When a computer (that we call a client) tries to access a resource on a network, it performs an HTTP request. This HTTP request is processed by a machine called a server, and this server is in charge of answering positively or not this request. Which leaves a footprint called a log, sometimes referred as a server log.
A server log is a standardized data, let's see how it looks like:
- Domain name: it corresponds to the machine the request is made to, it includes its port too. As a whole, it allows you to know which of your information system is the most requested.
- IP address: this data corresponds to the identifier of a machine on a network. In the case of navigation on the web, the IP address is assigned by the Internet Service Provider (ISP) of the internet user. It can change and it is linked to the ISP subscription of the internet user.
So to say one IP address does not equal an individual in particular. Indeed, an household is probably sharing the same IP address among all the family members. To each IP address corresponds a physical location. This physical location can sometimes be very close
to the real location of the user. This is why those IP addresses are often use to define the location of the internet user, geolocation coordinates, street, city, region, country, continent. Have a look at the MaxMind database to have an idea.
- Connection time: it corresponds to the time when the request was made. So here of course, by analyzing those times you will be able to define when most of the connections are made to your website/app.
- File requested with its path: it corresponds to the file on your server, a css file, an html document, a pdf file. The path being the location on your server. It helps you identify which content of your website is the most popular.
- Status of the request: it shows you how the server handled the request. A 200 status means that the client has the authorization to access a given piece of content. A 404 means that the file cannot be found, so it helps you already identify if the website
has some issues or not.
- User-agent: it corresponds to the client technology. So for an internet user it will be their browser. So here it allows you to know which technology people use and if they are technology savvy or not.
Note though that those logs are available on all servers. Then, if you need to see what logs look like, you can just use a local web server. You will mostly find those files on /var/log/apache2/access.log
Note also that if you cannot have an access to those logs it means that your rights are not high enough and that someone else has the access to it.
And note also that logs include all the requests made by the internet user + the robots. So don't be surprise if you have many logs within your files.
Log files provide many advantages, among them:
- Data are automatically collected by the server, so you don't have to care about them being collected or not.
- Data are standardize, which means that you can compare them with other data collected on other servers. The method of data collection is the same, the pattern is the same.
- Data about robots are collected, so you can use it for Search Engine Optimization (SEO) in order to analyze how search engines are crawling your website.
- All the connection data are collected.
- Log data collection do not influence the loading time of your pages.
- Very easy to identify, the status code of your documents 400 and 500 and the full download.
Among the drawbacks:
- When a file is in the browser cache of a client, it is not recorded in the logs.
- Logs are raw data and you need a tool in order to process them and make the mathematics for you. It is up to you to decide if you want to process through CLI, LibreOffice Calc or any log server solutions out there.
- It is kind of technical data, and one needs to learn how to process it so as to get the best out of it. Luckily you are attending this course so you are not concerned by this anymore :).
- Some data are not collected, in fact logs are just downloads recordings, and as a result when a behaviour of an internet user is happening on your website/app and did not lead to a download, the data is not recorded. Typically that's the case when a user clicks on a drop-down menu, then click on a add to cart button... That's the reason why you need extra data collection tools.
As you can read from above log data are very interesting but having a file full of log won't help you much, you will need a way to process those data and make some mathematics out of it.
There are many Free Software tools out there that you can install that will allow you to process those logs. We will mainly use Matomo Analytics in this course to show it.
You may also need other data not among the one listed above, that's why you will need extra tools to get the right data and probably will look for page tagging solutions.