Automate Thyself

For quite some time, my own ops haven't had much dev in them. But I'm changing that.

An industrial robot arm presents a carnation to a woman, who seems pleased by this.

When I say "my own ops", I'm talking about this blog, and its related infrastructure. I run this on an instance of Ghost that I host myself. It's probably worth discussing why I self-host it in the first place. The short version is: control, cost, and because I can. Ghost will gladly provide managed hosting services for you. But it's surprisingly expensive. I would be paying for a lot of things that I just don't want. I have no interest in taking payments, or running ads, and I'm only barely interested in making this thing into a newsletter (and that only recently as it seems some people actually like that dynamic). I also liked having some cloud computing around that I could use for other things. In fact, for a couple of years I hosted game servers on the same machine. So, with inexpensive hosting from OVH, I could have total control, more options, actually lower cost, and this is a feasible thing for me to do. I have the basic skills for it. So that's what I did and continue to do. I'm certainly trading some time for money in this case, which may seem like a dubious prospect. But it makes sense for me in this case. At a business level, you usually want to do the things that are part of your core competency and outsource everything else. I'm not a business, and the logic doesn't map directly onto individuals, but similar ideas apply. The biggest difference is that I can also just decide for myself that I want to do things, that things are worth doing just for the experience, and then do that.

History

I started writing this blog in 2016. I don't know if that sounds like a little or a lot of time, so here's some context. This was around the same time Kubernetes hit version 1.0. It's a little older than the iPhone 7. And it's back when I still had a boy's name. So, it's been a while.

At the time, I was just interested in having a tech blog, and I wasn't even sure I would keep doing it. So, I spun something up, started writing, and let that be that. It ran on an OVH managed VM, hosted out of a datacenter near Montreal. Why Montreal? Mostly because at the time that was their only North American DC (as mentioned, it's been a while). I would occasionally need to shell into the host to do some work. At various times I ran other services on the same host, and I needed to set up and manage those. Somewhere along the lines my renewal scripts for Let's Encrypt certs broke. So, I would again need to shell in to renew them myself. I did this all live on what is essentially a—admittedly low stakes—production server. 😳 Every time I did, I would think to myself that I really need to find a better solution for this. Then I would promptly put it out of my mind for about 80 days until the next time I got a warning email about pending expirations. Until this last time.

Oops

The last time I went through this SSL renewal process, I was in the mood to do something about it. So, I did what I had become accustomed to doing. I mucked around with the live production server. My first thought was that newer certbot clients set up renewal tasks for you, so I should just update certbot. That went poorly. The new version couldn't read my old configs. And the old version was so old it wasn't in the apt repo anymore, so I couldn't roll back easily either. Which meant I had no way to renew my SSL certs. And that's a problem. I grabbed an export of my site content to be sure I still had it, and then revisited the better solutions.

I say I had been putting the better solutions out of my mind. But that's not exactly true. I had taken this on as an early covid project, back when I thought I might be able to do covid projects. It turns out I wasn't. But at least that left me with a notion of where to start. So, I threw that all away and started over with a new Ansible project. I run Windows on my personal computer. And WSL has gotten a lot better in the last two and a half years. I think that helped a lot. Or maybe I just wasn't super depressed. Whatever it was, the project went much better this time.

Ansible

The basic idea was that I wanted to have some IAC mechanisms in place to make setting up and maintaining this project a more repeatable task. On the pets to cattle spectrum, the old host was a baby. My goal was to get somewhere to the other side of pets. I'm still doing this on a budget, because it will never make any money for itself. Turning to a fully managed cloud wasn't an option I considered. That means more sophisticated tools like Terraform were not even the right tool. This is a persistent server encapsulated by a persistent virtual machine. But it's one that will be re-creatable and with more automatic maintenance. Once I had this working in a free VM, it was time to provision a new paid one and decommission the one from 2016. I'm a little sad to say my blog is no longer Canadian. US hosting was a bit less expensive, and honestly, I expect it to be less prone to getting flagged as potential fraud by my bank.

Getting Started

The first thing I needed to do was to build up some ansible playbooks. And when we call it Infrastructure as Code, that is well named. It's a development process, and one that we need to iterate through. To make my iterations fast(er) and inexpensive, I stood up a new local VM in Hyper-V. Why Hyper-V? Because it's there. I run windows. I quickly pretend to be a cloud provider by clicking through the menus to get a fresh Ubuntu install, and then I have Ansible take over from there.

The first step is preparing the environment itself. Install Nginx, MySql, Node.js, certbot, the Ghost CLI tool, and some system tools. Easy enough. Next step is to set up and secure MySql. That turns out not to be simple. At least not with Ansible. At least not idempotently with Ansible. And idempotence is a very very valuable characteristic in an Ansible playbook, because that allows them to be re-runnable. And re-running them is a big part of how I intend to make my system maintenance more automatic. The problem is that on the first run you have to assume there is no password, but on subsequent run there definitely will be so the assumption will break. The solution is to add the password to the ansible account's default config for subsequent use. That way the behavior when a password is not specified is correct both before and after.

Installing Ghost

Ghost is pretty easygoing. It needs Nodejs and MySql to be installed. And that's about it. The CLI will configure MySql and Nginx for you if you let it, as well as your systemd services and the Ghost instance itself. This is very convenient if you're me in 2016, and very inconvenient if you're me now, trying to make this process idempotently re-executable through Ansible. This step involved a lot of trial and error and poking around in Ghost's source code to figure out what the CLI is actually doing. But, I was able to reproduce the valuable things that the CLI would do, if I was going to allow it to be in charge, which I'm not. Ansible is in charge. And then I recreated those things with Ansible, and with some improvements. An easy example is that I configured Nginx to serve most of my static assets. The CLI's configuration is to just pass everything through to the Ghost service. Nodejs has some strengths, but that's not one of them. Especially not compared to Nginx.

Ghost needs to have a system user to run the ghost api process. It also needs config files and content directories with appropriate ownership and permissions, and a MySql user. All of which the CLI would create with pretty loose permissions, and which I set to be much more restrictive. With that all done, what I had was a fresh install of Ghost and none of my content.

Certbot

Once Nginx is installed and running, I could create a bootstrap server config and get a new SSL cert. This of course requires pointing DNS at the new host, which is awkward, because it means my domain just won't resolve to anything good for a little while. In this case, it was something like 5 minutes. I did it right before moving on to restore the old content. This was also more manual than I would have liked. Specifially, I just ran certbot at the command line and let it set up renewal tasks for me. I'll see about coming back to this in the future.

Backup and Restore

This is where things get a little bit manual. 😅 Part of that is on Ghost. Part of that is on me. My SSL certificates were expiring, you see. I needed to get this in place. And Ghost's CLI just is not at all amenable to scripted use. Importing the content fails unless the server is online. But it comes online in a tremendously insecure state, where whoever gets to it first can just create an admin account for themselves. This is a chicken-and-egg problem that really doesn't need to exist, but it does. Creating these accounts should be doable offline. It should be doable through the CLI. It should be doable via an invite mechanism or with some secret token. But it's not. Maybe I'll look into adding those features? Part of me wonders if they would be rejected, though, because the Ghost business model is to provide managed hosting.

Anyway, I modified some Nginx configs and did some manual CLI things and got an admin password set before the service was exposed to the internet. Huzzah. The last step was to import my content. This I also did manually. There's a feature to do this in the Ghost admin portal, and that just worked. 😮‍💨 To be honest, I wasn't sure it would (I did test it, but I wasn't sure before that). My old install was 3 major versions out of date. So, kudos to the Ghost.io team on that point.

To Be Continued

So I saved all my content and avoided looking like I can't manage a simple blog during my job search. If you happen to have been looking at the site on Sunday afternoon, it's possible you saw an expired cert warning. It's also possible you saw some default Nginx pages for a few minutes. But in all likelihood, no one would have ever known about that if I hadn't mentioned it just now.

I'm in a much better spot than I was at the beginning of this story, but I still have some things left to do.

Backups

I need to set up proper backups of the MySql database. Unlike the Ghost export/import feature, that would be highly amenable to scripting. I also need to set up backups of the site's assets (mostly images), which are not stored in the db.

Updates

I need to set up a solution to perform regular system updates. It's much easier to do now, but still not automatic. I also need to do regular updates of Ghost itself. Again, that probably means fighting with the Ghost CLI which is geared toward being easy to setup but not easy to manage.

More Stuff

I'd like to have a personal wiki. It's something I've done a couple times in the past and then abandoned because it's too much work to maintain. But that maintenance is exactly the problem I'm solving, so the value of it is clearly positive again. It's possible I could also do other things with the server. I've hosted game servers on this system before. I'm not sure if I would do that again or give it its own host. But I can imagine other things living here.

Monitoring

I need to get set up to collect metrics from Ghost, Nginx, MySql, and the OS itself. And then I need to send them somewhere. Probably Grafana. Same with logs.

Cleanup

I'd like to put this all on github. But I need to organize it better and make sure I'm not leaking any secrets in my version history.

Edit: here it is, if you like.

Edit (2024 edition): I did most of the stuff on this todo list since I wrote it. But it was hard to extend. So, I recently refactored my playbooks to support adding "more stuff". Blog post TBWritten.


Cover photo by Pavel Danilyuk