Analytics

Wednesday, Feb 24, 2021
#unix #web #privacy

First of all, fuck google analytics.

For that matter fuck all proprietary, javascript, bloated, malware, exploitative, bullshit logging systems. If you’re forcing your users to run code so you can spy on them, you should feel bad. For all of the remotely ethical “use cases” of Google analytics you can do the same thing by just looking at your web-server’s logs and not sending crap to some adware mega-corp.

Chances are your server already stores logs with basic client info sent to you without asking. These usually contain IPs, requested files, transfer amount, 404s, and a user agent string. That means you can uniquely track a session with an IP + device combo, get a rough IP-based location, and see what pages they looked at or couldn’t find. Arguably that’s still too much information as computer security isn’t exactly public knowledge and operates like a fucked up dark art. If for some reason your website stores private information that could lead to someone being harmed if leaked – JUST DON’T LOG – all stored information will be leaked with enough time.

If you’re looking for a good tool to analyze your logs rather than just grep, zcat, and pcre I recommend goaccess. It has a curses interface, similar to htop, but can also generate a static webpage or image. I use OpenBSD’s httpd on my server. It has two logging modes: combined and common. Contrary to the name, combined is actually more popular these days, but common is the default on OpenBSD’s httpd. The syntax is almost the same as Apache, but the time has a slight variation. I’ve reconfigured httpd to use combined by adding log style combined to my server blocks.

Then I created .goaccessrc with the following. If you’re just using nginx on Debian or something you might not need any configuration at all, but the format is simple enough you can get it to work even with really obscure web servers.

time-format %T
date-format %d/%b/%Y
log-format %v %h %^ %^ [%d:%t %^] "%r" %s %b

Finally, you can run this command to zcat all your compressed logs and pipe them into goaccess.

zcat /var/www/logs/access.log.*.gz | cat /var/www/logs/access.log - | grep -v syslog | goaccess --no-global-config