Growth? Downscaling!

2022-06-16

operations , personal

TL;DR: It occured to me that I am wasting significant amounts of my spare time (that I could instead dedicate towards pursuing more interesting, meaningful activities) on maintaining infrastructure that is unnecessary for a personal setup. I ended up scaling down quite a bit.

I had looked at my screen intensely for at least 20 minutes last weekend when my partner asked me why the look on my face was that grim.

She was asking me that from the other side of the table of the comfy little coffeeshop near my home that we decided to spend the first half of the day in. She was working through some texts that were supposed to be part of an upcoming university exam. I was going through a list of interesting articles that have piled up in my browser throughout the week.

Well, at least that’s what my original plan was, what I was looking forward to doing. In practice I was debugging an issue with my personal VPN-setup because some of my servers were experiencing intermittent package package loss. But only via IPv6, and only in seemingly random intervals.

While I did eventually figure out the issue, a problem with my upstream provider that was out of my hand, it was a frustrating experience. Instead of reading about cool projects or interesting topics I was spending my time looking at traceroutes, netflows and the output of tcpdump.

I frustratedly gulped down another cup of coffee when it occured to me that this has become a recurring theme in these past couple of months. More often than not I spent large chunks of my spare time fixing my computers.

Things such as fixing an issue with internal DNS-resolution (because a zonefile wasn’t deployed properly, causing a split-brain-esque situation), repairing TLS-setups (because a cronjob failed and the monitoring system, for whatever reason, decided to not alert me about that fact).

Pondering about this frustration eventually led me to the question why I was doing this, what the point behind all of this work was. Ultimately this made me ask myself: How much of operational overhead have I created for myself? Have I overcomplicated things for myself?

Despite the answer to the last question being a resound “yes” in almost all cases, I took some time to look at the documentation I created for my systems - which in itself should be a pretty good indicator of how much I had overengineered things - in order to determine how many of the machines I ran were necessary, as in “if I don’t run this I will encounter issues”.

To be clear: I am specifically not talking about the lab I talked about in an earlier post (for which I still want to complete the follow-up post I have had drafted for months now ..). This can grow as wild and chaotic as it wants to be. I am talking about the machines behind my basic services, such as this blog, my mailserver and so on. As well as things I was hosting for friends and family.

And as expected, my setup was grossly overengineered. I was running a lot of machines and services that technically provided a meaningful addition to my network / setup, but were practically irrelevant for the type of systems they were supporting.

The reason for why this has happened was simple. It was a case of ‘trying to be professional where professionalism isn’t necessarily needed’. I don’t need 99.999% uptime, so there’s not really a need for building things with proper redundancy in mind.

I don’t need any form of high availability. I don’t need four physically separated backup locations, my flat not even included, all of them oversized. I don’t strictly need a backup MX, as nice to have as it may be. I don’t need to run my own recursive DNS-resolver, and as much as I tell myself that the abilities to log queries is important .. if I haven’t ever touched the data in order to implement all the cool ideas I had, it can’t be that important after all.

If I am being honest with myself, I don’t even really need a monitoring system. Safe for my mailserver, and a handful of applications used by other people besides me, it would probably take me days to notice that the system is offline, without being told about it by said monitoring. And all of the work I do for others is for free, so I’ll allow myself to not feel compelled to provide 99.999% of uptime.

The lesson I learned boils down to a simple truth: I created significant headaches for myself through the process of ensuring that no headaches at all could be created for me.

I don’t want to think about how many issues, or even outages, I created by, for example, loadbalancing a website (that consists of nothing but a bit HTML and some pictures) that I was hosting for a friend. Looking at the uptime graphs of the hosters I would have experienced less downtime if I had simply let it run on a single instance of nginx.

I was spending way too much thought on unnecessarily “perfecting” a setup. I was spending way too much time on trying to achieve that “perfection”. And the pressure I felt was most likely not good for my mental health as well. So I decided to make a step that was surprisingly difficult: I was to significantly downsize my servers.

I came up with a lost of what I truly need (as in “I would have to otherwise rely on services to fulfill my needs in that area”):

A webserver for my blog, and also for dumping some memes and the likes to quickly share them with friends. I currently use a dedicated machine for that, with a virtual hard disk that’s in excess of 100 gigabytes in size .. with the files that machine serves being less than a gigabyte in size. Talk about overkill.
A webserver for small small web applications for friends and family
A mailserver, with at least and MTA and and MDA, but preferably with some spam-filtering capabilities as well
Some kind of version control system; in my lab I’m fond of Gitea, but I’m relatively sure that Gitolite would work well enough
Backups; This I’m not entirely sure about. I’m probably going to keep one backup server, use Borgbase for a secondary location and a hard disk connected to the machine running permanently at home as ‘last ditch effort’.

What I could technically live without, but want to keep regardless, because I have grown fond of the comfort it provides to me:

Two authoritative DNS-resolvers - ever since I quit using registrars and their horrible panels, opting to manage my zonefiles on my own, I don’t want to go back.
An instance of ArchiveBox - I used pinboard.in in the past, and while it’s an awesome service, I really fell in love with the ability to seamlessly and fully archive sites for later use, including media files.

What’s kind of cool to use or have, but either not that necessary in the end or probably done a lot better by other people and available by providers I’m comfortable trusting:

Recursive DNS-resolver; as much as I enjoyed the learning experience of setting up my own DoT-resolver, I did not yet end up going where I was planning on going, for example collecting DNS-logs for playing around with the gathered data.
Monitoring

After learning all of that it was time to, .. well:

I deleted virtual machines, decommissioned services and consolidated applications into one server that have lived separately before. (I did rent out one new virtual machine though, this blog is now hosted with openbsd.amsterdam. Because why not.)

I have spent, and to some degree continue to spend, a not insignificant amount of time and energy into automating things, with the help of hundreds of lines of Ansible. According to a quick, dirty, automated count 1578 lines of Ansible to be specific. So for services and applications that I decided to keep this was as simple as running a playbook and rsync’ing the backups in place.

At the time of writing I’m down a whoping 19 virtual machines, which is nearly 2/3 of what I had before. Which also means that the ratio of “support” systems to “service systems” was roughly 2:1 before.

On one hand you could argue that this meant that I was doing a good job. Physically separated backups, redundant DNS-resolvers (both recursive and authoritative), dedicated monitoring, fallback-VPN. Stuff one would recommend for businesses to have in place with concern for Business Continuity Management. But on the other hand .. man, that was one hell of an overengineered setup.

I haven’t yet fully completed the transition, there are some things that require more attention and some things where I need to physically visit the datacenter where machines are hosted. But despite the yet to be done tasks I feel better about the setup already. My maintenance work has tremendously decreased.

I’m looking forward to see how much the work and effort I have to put into fixing issues and generally administrating systems will change, or rather (hopefully), decrease in the long run. If things will stay as quiet as they do now. And if I will be able to transform the reclaimed time into productive activities.

Not necessarily with regards to professional accomplishment or educational successes, more with regards to generally spending my time on things I like doing, rather than things I feel like I have a responsibility to do because I want to be a good netizen who ensures his systems aren’t harming others.

Another upside that I realized only after all of the resizing was done: I’m saving some money in the future. Despite virtual servers being cheaper than a cup of coffee in some instances, running many of them accumulates at least some cost.

It’s a rough estimation, but it comes down to around 25 Euros every month that I won’t be spending on hosting going forward. That’s nearly a third of all of my hosting expenses, and more than half if I don’t include the 50 Euros I pay for server housing every month. Not bad!