Analytics

Wednesday, Feb 24, 2021
#unix #web #privacy

You don’t need google analytics. It is a tool that exists solely to trick people into running spyware. This is not informed concent.For all of the remotely ethical “use cases” of Google analytics you can do the same thing by just looking at your web-server’s logs and not sending crap to some adware mega-corp.

Chances are your server already stores logs with basic client info sent to you without asking. These usually contain IPs, requested files, transfer amount, 404s, and a user agent string. That means you can uniquely track a session with an IP + device combo, get a rough IP-based location, and see what pages they looked at or couldn’t find. Arguably that’s still far too much information. Computer security isn’t exactly public knowledge and you’re fooling yourself if you think most people are informed about the informatin their browser willingly reveals. Especially, when companies like Apple, Brave, Mozilla, and just about all VPN companies lie about privacy. Many (understandably) think incognito/private mode means a website cannot tell you were viewing it. If for some reason your website stores private information that could lead to someone being harmed if leaked – JUST DON’T LOG – all stored information will be leaked with enough time.

If you’re looking for a good tool to analyze your logs rather than just grep, zcat, and pcre I recommend goaccess. It has a curses interface, similar to htop, but can also generate a static webpage or image. I use OpenBSD’s httpd on my server. It has two logging modes: combined and common. Contrary to the name, combined is actually more popular these days, but common is the default on OpenBSD’s httpd. The syntax is almost the same as Apache, but the time has a slight variation. I’ve reconfigured httpd to use combined by adding log style combined to my server blocks.

Then I created .goaccessrc with the following. If you’re just using nginx on Debian or something you might not need any configuration at all, but the format is simple enough you can get it to work even with really obscure web servers.

time-format %T
date-format %d/%b/%Y
log-format %v %h %^ %^ [%d:%t %^] "%r" %s %b

Finally, you can run this command to zcat all your compressed logs and pipe them into goaccess.

zcat /var/www/logs/access.log.*.gz | cat /var/www/logs/access.log - | grep -v syslog | goaccess --no-global-config