Engineering

How Asana is creating sane building systems for developers

Providing a world-class developer experience is one the most important goals for the next generation of Asana’s framework and the build system is a critical part of that mission. When a build system works, it allows the developer to enter flow and focus on what they are developing. However, when the build system is unreliable or non-performant, it one of the largest obstacles for developers to reach flow. Here are some principles that we’ve learned that should be relevant regardless of your choice of build system (Gulp, Bazel, Make, Maven, etc.)

Don’t Trust The Network But Really Don’t Trust Semantic Versioning

Any build system that relies on the network to build will suffer from unreliability. While the network works most of the time, there will be cases where external systems have downtime, external artifacts get corrupted, or the developer themselves does not have access to the network. In the past, NPM itself has suffered downtime and served malformed tarballs. One approach to avoid accessing the network is to vendor all the dependencies into the version control system. The downside of this approach is the increased size of the VCS and the network time to clone the repository. The clone time is especially important when using a continuous integration service since they often clone the repository from scratch. Another alternative is to vendor the dependencies on an external service you control such as S3 or a mirror of the package manager.

However, even more important than the network is understanding semantic versioning. Semantic versioning allows packages to specify a range of acceptable versions for each package they depend on. Semantic versioning follows a convention of how to version packages so that dependents do not break on updates. This versioning scheme is incredibly helpful for solving the diamond dependency problem and can ensure one version of a package if all of the ranges for that package overlap. The downside of semantic versioning is that it requires the publishing developers to correctly follow the versioning convention. While this is often true in practice, there are incidents where a version change will break the build process. In order to avoid this issue, use the package manager’s locking feature if it has one. For node, use npm shrinkwrap —dev to make sure all dependencies are appropriately locked.

Never “Glob” Over Build Artifacts

A common workflow that in build systems is to glob over source files, perform a build step into a directory, and then glob over the files in the directory as input to the next build step. This workflow is another root source for build unreliability. As an example, imagine a developer deletes a source file but another file still references the deleted file and the build step does not catch that error. On the developer’s machine, the output could continue work because the deleted file may still be in the build directory but it would fail on any machine that performs a clean build. Reliable build steps should take a model of (inputs: Set[File]) => Set[File] and only allow globs for files in the source tree.

Cache Everything And Invalidate Through Checksums

Every build step performed impacts a developer’s ability to enter flow. If the build system is reliable and reproducible then caching becomes a matter of knowing dependencies and invalidating the right targets for every source file change. Bazel takes this to an extreme in not only caching build artifacts effectively but also caching test results. This makes building and running tests efficient. In order to determine when to invalidate the cache, using checksums is better than timestamps. The disadvantages of checksums are that detecting differences requires maintaining state with the previous checksum and that calculating checksums can take longer than comparing the file modification times. The advantages of checksums are the increased reliability, more accurate calculation of work, and the possibility to share the cache across machines. Checksums do not rely on system clocks which may be modified nor the behavior of the version control system or continuous integration service. Since invalidation is always a function of the content, not the global state of the computer, reliability is further increased.

One System Is Better Than Many

During the initial phases of development, each service had its own repository which had its own build system, testing process, and release system. Developers tried to maintain consistency across the repositories but the systems would often diverge causing developer confusion. Some common causes of divergence were locating modules and compilation and test flags. This meant changes which worked with one system would not always under another system. Beyond that, performance improvements in one system were not shared with the other. Moving to one system with high caching maintains a consistent and performant developer experience regardless whether the task is building, testing, or deploying.

One Repository Is Better Than Many

This was the principle that took the longest to learn through the process. The original Luna framework had become a monolithic system contained within a monolithic repository. Development had become painful and the goal was to migrate to a service-oriented architecture. At the time, services were coupled with repositories which caused developer pain when dealing with changes that spanned multiple services. While developers knew the correct order to execute changes, accidents happened which would cause a quick fire drill to get the repositories back to a working state. By switching to one repository and using a strict build system with clear dependencies, the new framework benefits from most of the best of both worlds.

TL;DR

Investing in the reliability and performance of the build system can provide tremendous benefits to both the maintainers and consumers. The maintainers no longer need to worry about the local state of the developers and the consumers no longer need to worry about the build system to complete their work.

Would you recommend this article? Yes / No