This post originally appeared on the Standard Treasury Engineering Blog, here. It has been re-published here with permission.

Today I'd like to talk about software testing - a subject dear to my heart. In particular, I'd like to talk about testing a modern microservice architecture, using Standard Treasury's software stack as an example.

Let's begin with my assertion that this is a both hard and unsolved problem. If you disagree with that, I'd love to talk to you about what you know that I don't. That's a serious offer, by the way.

Okay.

I'm going to begin at the beginning and work my way up.

Unit & Functional Tests

In the beginning, the engineer wrote source code and unit tests. The unit tests were self-contained, had no side effects, and were fast. The engineer said, "let these tests run", and saw that the passing tests were good, and divided the passing tests from the failing tests.

And the engineer said, let this be reviewed by my peers. And verily, the peers did review, and the source that was good was accepted, and the source that was poor received some additional revisions. And the source and tests were checked into version control on the second day.

On the third day the engineer wrote more source, and with this source wrote functional tests. And although there was action at a distance, and many moving parts, the engineer was able to compose guarantees around performance and behavior by complementing the functional tests with more unit tests, and it was good.

And however the engineer may have felt about this process, the fact remains that it was easy, and well-understood, and accepted. There was a process, and it was followed, and the development team was on the same page.

As the engineer wrote more applications, they sought to have those applications talk to each other through widely accepted standards. And the engineer thought: how should I test that my applications will work each other?

Genesis, please exit stage left. You're a cute allegory, but we're about to get real serious up in here.

Integration Tests

Integration tests are hard. There's a lot to think about, many components to keep in sync.

For instance:

How do you ensure all of the versions of your applications and underlying services are in sync?
How do you configure your networking? Is it the same in your test environment? Staging? Production?
How do you debug when something doesn't work as expected?
Are all of your logs in the same place?
Are you making sure your local environment isn't polluting your configuration?
Lacking static analysis tools, are you sure the integration test coverage is adequate?

These are all manageable issues when there's only one application and a few underlying services. As additional applications and services are introduced, however, things can quickly get unwieldy.

For example, we have a microservice architecture built on top of Postgres and Storm (with Kafka as the message queue). Putting security aside for a moment, a call to our API to make a transfer involves calls to our customer information service, our ledger service, as well as records being written to the database and messages potentially being enqueued and dequeued via Storm.

And, of course, let's not forget that these applications are all under active development and need to be kept in sync with each other as their interfaces evolve.

All that having been said, and in spite of the difficulties, integration tests are not optional. Our applications and services need to talk to each other, and while tests are not a fully effective vaccine against bugs, without them we're just flying blind. At the very least, our deployment days would be likely to contain some very nasty surprises.

The Naive Approach

A naive approach to integration testing might involve running all the different services locally in different windows, and having a test suite that ran against that. There are obvious advantages to such an approach - it's very easy to set-up, fits nicely into one's existing development flow, and makes testing different branches trivial.

On the other hand, there are a lot of downsides. Configuration can be messy, and can vary significantly if your development team isn't all using the same operating system. If one isn't paying attention, one can end up with local version mismatches by accident. If you're the sort of place that's into the twelve-factor app, you might accidentally pollute your tests by not paying close attention to your environmental variables. Debugging problems can be a headache as your log output is spread across many windows and locations. Finally, this approach doesn't port well to modern hosted CI environments well without some major hackery

A Better Way

I'd like to propose a more modern, devops-driven approach: use Docker! In particular, I'd recommend using the recently unveiled Compose (née fig) to standardize your integration testing.

Quoth the website:

Compose is a tool for defining and running complex applications with Docker. With Compose, you define a multi-container application in a single file, then spin your application up in a single command which does everything that needs to be done to get it running.

Let's start by taking a look at what a configuration file looks like for Compose. In doing so, I'm going to show you what a subset of our configuration file looks like.

At the lowest level, we have Docker containers running Postgres 9.4.1 and ZooKeeper 3.4.6:

db:
  image: postgres:9.4.1
  name: postgres
  ports:
    - "5433:5432"

zookeeper:
  image: jplock/zookeeper:3.4.6
  name: zookeeper
  ports:
    - "2182:2181"

There's not too much that needs explanation here, though the ports might be a little confusing - for instance, what's the significance of "5433:5432" for Postgres?

The answer is that we want to be able to access Postgres for debugging purposes, and so we need to map the container's port (5432, the default Postgres port) to a known host port (5433, in this case) for convenience. I've chosen to use 5433 as a host port instead of the default 5432 because I often also have a Postgres service running on my dev box, and I don't want these ports colliding.

It's worth keeping in mind here that our Docker host in this case is running on a Vagrant VM, so we also need the following lines included in our Vagrantfile to forward the relevant ports from Vagrant to OS X.

# Zookeeper
config.vm.network "forwarded_port", guest: 2182, host: 2182

# Postgres
config.vm.network "forwarded_port", guest: 5433, host: 5433

Now let's look at something slightly more complicated - getting containers to talk to each other. Kafka is a good example here, because it registers itself with ZooKeeper and uses ZK as a central manager for the various brokers.

Sidebar: in our case, we only have one broker instance, and our configuration reflects that. However, Compose is capable of scaling up services to arbitrary sizes, and so if we wanted to we could be using Compose to run our tests with multiple Kafka brokers all running on the same Docker host, with coordination handled by a ZooKeeper also running on that same host. Cool, right?

kafka:
  image: wurstmeister/kafka:0.8.2.0
  name: kafka
  hostname: kafka
  links: 
   - zookeeper:zk
  environment:
    KAFKA_ADVERTISED_HOST_NAME: "kafka"
    KAFKA_ADVERTISED_PORT: 9092
  ports:
   - "9092:9092"

This section of our configuration has a few new things. First, you'll notice the hostname field - in most cases, you won't actually need to worry about this, but we ran into a particular networking problem with Kafka, ZooKeeper, and our CI server that required us to be explicit about the Kafka service hostname. As the key name suggests, this sets a value in your /etc/hosts file for the current service with the provided name.

The links section is where we tell Compose to network two containers together. In our case, we're linking to our ZooKeeper service (which we defined above) and giving it the alias "zk". Compose will now take care of (a) exposing ports for the two services to communicate with each other and (b) adding an entity in the Kafka container's /etc/hosts file for the zookeeper service, which will use the alias "zk".

At the risk of sounding too excited, this is a huge deal. A cursory look at the Docker networking documentation should suffice to give an idea of how complicated container networking can be, and Compose makes it unbelievably simple to get things to "just work".

The rest of this configuration is fairly self-explanatory - we've seen port configuration previously, and the 'environment' section should be apparent - it sets environment variables inside the container. In this case we're setting some basic configuration fields that Kafka expects and will use to communicate with ZooKeeper.

Last, but certainly not least, is what configuration for a Standard Treasury application looks like:

ledger:
  image: docker-registry-us-west-2.corp.standardtreasury.com/ledger:latest
  name: ledger
  environment:
   - DATABASE_JDBC_URL=jdbc:postgresql://db:5432/ledger?user=postgres
   - DATABASE_URL=postgres://postgres@db:5432/ledger
  links:
   - db:db
  ports:
   - "5008:5008"

The beauty of this is that there's really nothing special here that we haven't seen before. At the moment, our Ledger doesn't rely on Kafka and ZooKeeper, but it does have a direct link to our Postgres service, as well as some environment variables to make the actual database connection easy for our Ledger application. We store our application Docker images on a private registry within our VPC, which explains the pretty lengthy image name. And that's basically it!

The above is but a subset of the Standard Treasury architecture, but it gives a good idea of what our greater configuration looks like. Compose gives us an easy way to layer and network services together to create a larger infrastructure that can be easily deployed, particularly in a development or testing environment.

On both our development machines and in our CI environment (we use CircleCI), we use Compose to first bring up all of our services, and then we run a test suite against the exposed services that Docker's running for us in the background.

Pros and Cons

First amongst the advantages of this is main value-add of Docker at large: it works on multiple platforms without any additional configuration. No more worrying about the underlying OS or having environment variables polluting your tests - if you can run Docker, you can run your applications with a preconfigured environment.

Convenience is fairly high on the list here; we've got a few Make targets that are a couple lines long, but by and large you can do almost anything you want to with Compose in a single instruction. For instance, I can boot the entire Standard Treasury architecture by simply being in the right directory and typing docker-compose up.

Running Compose allows you to easily nab all container output in Foreman-style colored logs, and to attach to individual container or selections of container logs as needed. You can make sure underlying service dependencies, such as your database or message queue, are pinned to specific versions. Lastly - and perhaps, most importantly - you should be able to get Compose working in your CI environment without having to jump through any further hoops.

Of course, there are some downsides.

The initial configuration can take time to get right, and while the Docker community has done some incredible work over the last two years in making container networking much easier, debugging networking problems can still be a real bear. For instance, we encountered a bug that only occurred in our CI environment, but had to do with the fact that the Kafka container didn't include itself in its own /etc/hosts file.

If your development team is using OS X, they'll need a separate Docker host running on their machine. In that capacity, boot2docker running directly on OS X can be a little rough around the edges. We've been using a Vagrant VM with boot2docker running inside the VM, which has definitely helped, but comes with its own trade-offs - additional teardown/rebuild time if the VM has problems, and difficulties accessing and debugging problems with Docker volumes.

Lastly, testing dev branches for multiple services simultaneously requires either (a) editing your compose.yml file, or (b) overwriting Docker images with specifically tagged builds for your dev branch. Neither of these are particularly ideal, and - speaking personally - I'm not sure I could even tell you what a better workflow should look like.

Conclusion

There are other frustrations I'll admit to - pulling Docker images from our private repository can take a while, and in general I want all of this to work faster. but there's no question that using Compose has allowed us to engineer a solid integration test suite that's portable across platforms, easy to deploy and run, and gives us a great starting place for debugging problems.

This is still the early days for Compose, and I'm looking forward to seeing what the Docker team and community come up with as more companies use tools like these to address the sorts of technical issues we've faced.

If you have thoughts on what we could be doing better, or have tried solving this from another angle, we'd love to hear from you (did I mention we're hiring?) - just tweet at us at @standardapi.

Until next time, nerds.

Discuss this post on Hacker News or on Twitter (@venantius)

Thanks to Chris Dean (@ctdean) for reading drafts of this post.