hydrocode

Sunday, April 5, 2015

Releasing Erlang Software

Before I knew better, I worked on a home grown release system for an Erlang project I had inherited in May 2012 at Cabulous. At that time, there were basically two ways to make changes in the production environment: either the commonly used rolling upgrade pattern, or occasionally if the changes were small, by editing the source code on the production system, recompiling, moving the beam file into place, and reloading the module from the console.

The rolling upgrade pattern, in which each node is in turn removed from the load balancer, upgraded, restarted, and returned to the load balancer, was problematic for our system. Each driver device maintains an open TCP connection to a node, and so each driver on a node undergoing a rolling upgrade must be disconnected, at which point it reconnects to another node. Unfortunately, the authentication process, which is heavily reliant on the database, is quite inefficient, requiring 13 separate round trip DB requests for a driver app login. The net result is that unless disconnections are made gradually, a simple rolling upgrade will overload the database.

The alternate method had the benefits of allowing the clients to remain connected while the server was upgraded, and also to not needing to remove the node from the load balancer, but was also very manually intensive and therefore error-prone. I was aware that, in principle at least, OTP projects could be hot upgraded using "releases", but up to that time, the codebase had not been properly designed for such techniques, and I was unable to get it to work as desired in a reasonable amount of time.

As Cabulous became Flywheel, the rest of the development team standardized on using git commit SHAs to refer to which code was deployed in a particular environment. In an effort to comply with this standard, I decided to automate the ad hoc "hot" upgrade process under the assumption that I could at least make some improvements, and hopefully migrate to a more OTP-compliant technique in the future. I had been solving a lot of ad hoc tasks using fabric, so it was natural to just extend that system organically.

Unfortunately, the current fabfile is very specific to the environment in which it was created. It doesn't update modules in dependency order, nor does it call "code change", etc. However, even given these limitations, for most software updates its much faster and causes fewer service disruptions than a rolling upgrade.

Still, my ultimate goal is to enable hot updates for this system using actual OTP releases. At first blush, the main problem seems to be that OTP applications and releases have version numbers of the form "$major.$minor.$revision" (instead of just a commit SHA). It does appear that rebar has some support for automatically generating an application version based on the current git commit SHA, but it's also the release which needs to support that sort of automatic naming as well. I've looked at relx, but am not aware that it supports automatic release naming and relups based on git SHAs, so I may have to write that myself.

Saturday, January 31, 2009

Exploring Erlang

When I first began running hydrodynamic simulations in graduate school to model astrophysical gas flows, I was curious why we were using that antique programming language Fortran, rather than something slick and modern like C. I had learned Pascal as an undergrad, and knew that there were many programming languages, each with its own strengths and weaknesses. I was frustrated with how crufty the Fortran code felt, but was assured by my professors that Fortran was really the best language for the job. Of course in retrospect, that the code was unmaintainable and difficult to extend had much less to do with the language, and much more to do with the lack of skills by it's implementors. These programs got the job done, but were hardly things of beauty to look upon with pride.

More recently at a startup, I first became aware of the concept of a supervisor program, which starts a worker program, and automatically restarts it if it crashes. This wonderful piece of engineering was a godsend, and allowed me to sleep through the night when I was on pager duty. Any time one of the app servers crashed, it was immediately restarted by the supervisor. Alas, the miracles of supervision didn't extend to restarting MySQL master-master replication when it failed — which it often did — so I still had to occasionally get up to manually fix problems that could not be automated away.

But the idea that software could be made more reliable sunk in. And while I loved programming in Python at the time, I began to look around for languages better aligned with my newfound philosophy that software should be able to heal itself when it fails.

It doesn't take much time googling "fault tolerant software" to run across Erlang. For someone like me who was steeped in Algol based languages, the syntax seemed painful. So, I dismissed it, and kept looking. Still, the more I researched, the more Erlang kept popping up. And once I dug a little deeper, reading more about the language, it seemed like something definitely worth exploring. When I discovered that Supervisors are a first class Erlang "behaviour", I was hooked.

I am currently working on my first real Erlang program, called subdomain. It allows a user to create email addresses for his or her personal subdomain, which relays email to some real mailbox. I have used this technique for years with my own domains to create email aliases for each entity I encounter that wants my address. For example, if my subdomain is jay.m82.com, and I encounter some web form at http://example.com that wants my email address, I would create the alias example.com@jay.m82.com and enter that in the form. Mail sent to that address will be relayed to wherever I've configured it, a gmail account in my case.

While this application may not be the poster child for Erlang's concurrency and fault tolerance capabilities, it is an itch I've been wanting to scratch for a while, and it's a step toward a more robust programming future that I'm greatly looking forward to.