Sunday, April 5, 2015

Releasing Erlang Software

Before I knew better, I worked on a home grown release system for an Erlang project I had inherited in May 2012 at Cabulous. At that time, there were basically two ways to make changes in the production environment: either the commonly used rolling upgrade pattern, or occasionally if the changes were small, by editing the source code on the production system, recompiling, moving the beam file into place, and reloading the module from the console.

The rolling upgrade pattern, in which each node is in turn removed from the load balancer, upgraded, restarted, and returned to the load balancer, was problematic for our system. Each driver device maintains an open TCP connection to a node, and so each driver on a node undergoing a rolling upgrade must be disconnected, at which point it reconnects to another node. Unfortunately, the authentication process, which is heavily reliant on the database, is quite inefficient, requiring 13 separate round trip DB requests for a driver app login. The net result is that unless disconnections are made gradually, a simple rolling upgrade will overload the database.


The alternate method had the benefits of allowing the clients to remain connected while the server was upgraded, and also to not needing to remove the node from the load balancer, but was also very manually intensive and therefore error-prone. I was aware that, in principle at least, OTP projects could be hot upgraded using "releases", but up to that time, the codebase had not been properly designed for such techniques, and I was unable to get it to work as desired in a reasonable amount of time.


As Cabulous became Flywheel, the rest of the development team standardized on using git commit SHAs to refer to which code was deployed in a particular environment. In an effort to comply with this standard, I decided to automate the ad hoc "hot" upgrade process under the assumption that I could at least make some improvements, and hopefully migrate to a more OTP-compliant technique in the future. I had been solving a lot of ad hoc tasks using fabric, so it was natural to just extend that system organically.


Unfortunately, the current fabfile is very specific to the environment in which it was created. It doesn't update modules in dependency order, nor does it call "code change", etc. However, even given these limitations, for most software updates its much faster and causes fewer service disruptions than a rolling upgrade.


Still, my ultimate goal is to enable hot updates for this system using actual OTP releases. At first blush, the main problem seems to be that OTP applications and releases have version numbers of the form "$major.$minor.$revision" (instead of just a commit SHA). It does appear that rebar has some support for automatically generating an application version based on the current git commit SHA, but it's also the release which needs to support that sort of automatic naming as well. I've looked at relx, but am not aware that it supports automatic release naming and relups based on git SHAs, so I may have to write that myself.

No comments:

Post a Comment