I've been working for a while to make it possible to use Terraform to manage Wikimedia Cloud VPS, and I'm finally happy to announce that for the most part, it's now possible*. Terraform is a popular open-source Infrastructure as Code tool that lets you manage your infrastructure configuration (such as Cloud VPS instances) with a special coding language/framework. You can then manage and review that code with familiar tools, such as Git.
If you want to just get started with Terraform on your own project, read the docs. Otherwise keep reading to learn about the technical challenges of making it all possible.
The core Cloud VPS platform is powered by OpenStack, an open-source cloud computing platform. OpenStack consists of various separate services, some of which are used on our deployment. These services already expose HTTP APIs, and for example the web-based dashboard (Horizon) uses it internally. However, until now these APIs had always been firewalled off from the public internet, and only some specific accounts were allowed to log in from the internal Cloud VPS network without the 2-factor authentication code that we require from all users by default.
Since Cloud VPS uses Wikimedia developer accounts, the passwords used to log in to the dashboard can also be used to log in to other critical tools. For this reason, we don't want to encourage our users to store these passwords as plain text on their computers. Thankfully, OpenStack's Identity service, Keystone, contains a solution that works for this use case: Application Credentials. These are essentially API keys that are tied to a specific user and a specific project. As a part of this project, we've enabled the use of Application Credentials in our configuration and wrote some documentation on how to use them properly.
The second major change needed on our setup was to open up the firewall rules that previously restricted API access to Wikimedia networks. It's now possible to reach the APIs from anywhere from the internet. As a part of this, we've also updated our load balancer configuration to make it easier to limit or block misbehaving clients.
Not everything on Cloud VPS uses an upstream OpenStack projects. Some components, most notably the current web proxy service and the Puppet integration (internally called the Puppet ENC API), are powered by custom code that's mostly been written using Python and the Flask framework. Historically they didn't have any proper access control, and instead we simply had configured our firewalls to block access to the APIs from everything except the Cloud VPS control plane servers.
Since this model doesn't let external users use the APIs directly, we had to come up with a new model. I ended up updating both of the affected services to use the Keystone API. After those changes, we've made the web proxy API publicly available like the vanilla OpenStack services, but the Puppet API is still private until it's fixed to work properly with external consumers.
Just having the web proxy API accessible on the internet doesn't mean that you can directly use it with Terraform. Instead, you need something called a "Terraform provider". Providers are programs that interact with Terraform and the external service (the Cloud VPS web proxy API in this case). There's an existing provider for OpenStack, which works great for anything that uses the vanilla OpenStack APIs, but I ended up writing a custom provider to work with our custom features. Since Terraform and providers are written in Go, I also ended up writing a Go library to work with the web proxy API. Support for the Puppet ENC API is planned once it's been updated to support external clients.
Since the official Terraform module registry (where Terraform
downloads the modules your code uses) is heavily built around GitHub,
a propiertary platform, I ended up deploying a self-hosted registry on
terraform.wmcloud.org to host the new provider. The registry is based
on the rekisteri project by Hugo Martins and has been lightly
customized to work for this use case.
It's now possible to do most things via Terraform that you can do via horizon.wikimedia.org. However, there are still a few major exceptions:
It'd be nice to get those fixed. In addition, I'm planning on working to make the entire system more streamlined with the Puppet setup we use to provision instances. Most WMCS managed projects use a standalone Puppetmaster to manage secrets. There are a few manual steps when provisioning or decomissioning instances to sign and revoke the TLS certificates Puppet uses internally, and I want to eventually make Terraform do that for you.
If this sounds interesting: get involved! The entire stack is licensed under free licenses and welcomes new contributors, and in my experience it's a great way to experiment with technology that might be otherwise hard be able toto play with.
This post was originally published on the Wikimedia Technical Blog.