In part one of this series, we discussed the radical transformation of the Motus internal systems as we moved from a legacy monolithic application to a service-based platform. This post will detail the evolution our staging and production systems undertook in order to support the new platform.
As we started moving towards a service architecture, our initial production deployment strategy was the same as with our legacy application: we had dedicated servers for different applications, we pushed environment-specific configuration to the servers with Puppet, and we did deployments by hand using instructions stored on our wiki. This worked for about two services, and then the management got completely out of hand. We knew that we needed a better way.
We needed to provide a standard model for deploying applications to production, and as we talked about in the previous post, Docker was looking more and more like the ideal solution. But Docker in late 2014 was still in its infancy, and the tools to manage it at scale were few and far between. After a survey of the landscape, we chose Mesosphere’s Mesos/Marathon/Chronos stack for running our applications in production. At the time, it was the most mature and production-tested set of tools that existed that supported Docker. Using Mesos also allowed us to greatly reduce the differentiation in our server classes. If all our applications were deployed as Docker containers, no longer did we need PHP servers and Java servers and Batch servers – we just had Mesos/Docker servers, and Mesos took care of distributing applications where they had resources to run.
One decision that we made when deciding on our platform was to run all of our batch jobs inside Docker containers. These batch jobs were all mostly legacy PHP scripts that ran against our monolithic codebase. In order to simplify our production system further, we decided to run these batch jobs as Docker containers. We modified our legacy PHP web container to run batch scripts, and that made it possible to run 100% of our production workload in Docker on top of Mesos.
When implementing a service-based platform, service discovery becomes a crucial part of the application infrastructure. In our development environment, we had launched using Hipache, a Redis-backed virtual host web server. This had a number of benefits, including performance and ease of updating, but didn’t have any native integration with Marathon. We wrote a Marathon-Hipache bridge that synchronized the tasks running in Marathon with Hipache, and that worked pretty well for a while. We had URLs like http://locationservice.motus.com that were backed by individual instances of the location service running in Docker containers somewhere on our Mesos cluster. We ran our production environment with Hipache for about 18 months. However, due to a lack of updates to Hipache and our need for non-HTTP-based services, we went in search of a replacement.
Luckily, about the time we were looking for a replacement, the marathon-lb project came into being. It uses Marathon’s native event API to keep track of what services are running where, and it supports Marathon’s concept of service ports to run non-HTTP services. We implemented that in the summer of 2016, and it has simplified a lot of things. And, of course, we run marathon-lb in a Docker container.
Every production environment uses a whole host of technologies for various things, and the Motus stack is no exception:
- GlusterFS for networked file storage
- Redis for sessions and job queues
- Our own home-grown Job Service for queuing and parallel execution
- PostgreSQL for our production database and data warehouse
- Puppet for configuration management
- Ansible for application deployments and bulk sysadmin
We have come along way since we started the transition in early 2014, and it keeps getting better! Future posts in this category will detail things that we have learned about secrets management, troubleshooting in a service architecture, and perhaps even remote work using telepresence robots!