Github introduced the pull request practice, and features to support it, to make it easier for people who run open-source projects to accept contributions from outside their group of trusted committers.
Committers are trusted to make changes to the codebase routinely. But a change from a random outsider needs to be assessed to make sure it works, doesn’t take the project in an unwanted direction, and meets the standards for style and quality. The outsider packages their proposed change as a pull request, which a committer can easily review and manage as a unit before merging it into the codebase.
Figure 1: Pull request process
Although designed to make it easier to accept contributions from untrusted people outside a team, many teams now use pull requests for people inside their own team. This practice has become so common that many people consider it a default, “best” practice. Some people assume there is no other way to make sure code is reviewed because they’ve never seen anything else.
However, pull requests sacrifice performance, including both delivery time and quality. This is a sacrifice worth making to manage the risk of accepting changes from unknown people. An outsider may not understand the vision and direction of your project. They may not have the same habits and norms for testing, code quality, and style. However, your own team members should share these norms.
Using pull requests for code changes by your own team members is like having your family members go through an airport security checkpoint to enter your home. It’s a costly solution to a different problem.
Using Continuous Integration rather than pull requests
A software delivery process should optimize for flow and quality. Keep the lead time for changes low, and give fast feedback when a change introduces a problem. This is the idea that underpins Continuous Integration (CI). CI is the practice of continuously merging and testing everyone’s code as they work on it.
Figure 2: Continuous Integration process
“As they work on it” is essential. As a team member, you don’t wait until you have finished a feature or story to integrate your code to the mainline. Instead, you frequently - at least once a day - put your code into a healthy state that passes tests and integrate it into the mainline with everyone else’s current work. (Also see Martin Fowler’s article on branching patterns and Paul Hammant’s trunk-based development site.)
A CI build job automatically tests the project’s mainline every time you push a change. This means you find out immediately if what you’re doing clashes with something another person is working on before either of you has invested too much time. It sucks to think you’ve finished a story or feature, only to discover you’ve got to go back and untangle and redo several days of effort.
Figure 3: Tests run on integrated code on every push
The trouble with pull requests
A pull request introduces a delay to integration. When you complete work that you consider ready to integrate with the rest of the team, you create a pull request and wait for someone to review it. Only after someone else reviews the change do they integrate it with the mainline.
If team members are quick to review and integrate pull requests, this is only slightly slower than CI. Maybe they respond and review your change within 30 minutes every time you push. Your code change is integrated with the mainline and automated tests run against it. So you may discover a clash with someone else’s work after 30-40 minutes or so.
Figure 4: Delays in feedback with pull requests versus CI
In practice, not many teams reliably turn pull requests around in under 30 minutes. While waiting for someone to review your change, you may switch to another task or start working on a new change. When you find out there was a problem, you need to switch gears back to the original change, disrupting your flow of work.
An effective CI build, on the other hand, should finish testing your integrated code within a few minutes after you push it - up to 10 minutes in our scenario. You discover that clash almost immediately, so you can investigate and fix it while it’s fresh in your mind.
You don’t need to interrupt someone else’s work to ask them to review it before you get the feedback from testing fully integrated code. As I’ll explain shortly, you may still have someone review your changes. But you can take advantage of a faster cycle time to commit, integrate, and test your code to make multiple changes before asking them to review.
Even if everyone in the team turns pull requests around quickly, the typical practice is to wait until completing work on a feature or story before integrating a pull request with the mainline. Most teams take longer than a day, on average, to develop a story. So a typical pull request process doesn’t meet the minimum requirement of Continuous Integration to integrate everyone’s work at least daily.
Working in a rhythm of coding, pulling, testing, pushing, and getting feedback from integrated tests several times a day is electrifying. And it isn’t possible with pull requests that introduce a human delay into the rhythm.
Better ways to review code changes
When the topic of CI versus pull requests comes up, someone inevitably defends pull requests as necessary to get feedback from other team members on changes.
It is essential to have a second pair of eyes (if not more) looking at code changes. Humans catch problems that tests don’t, especially problems related to maintainability and sound design. Having people review each others’ code also helps the team converge on norms for coding style, programming idioms, and quality expectations. And in some cases, such as regulated environments, having each change reviewed by a second person is required.
However, the recent popularity of pull requests seems to have resulted in some people assuming there are no other ways to review code changes. Here are a few practices that you can use instead, without interrupting the Continuous Integration feedback cycle. Keep in mind that it’s entirely possible to combine more than one of these as appropriate.
Figure 5: Pairing for immediate, continuous code review
Pair programming: No form of code review is more effective than pairing. Feedback is immediate, so there is a far higher chance you will use it to make improvements. If someone tells you as you write some code that there’s a better way, you can stop, learn, and write it in that better way, right then. If someone tells you a day later, you might take it on board for future reference. But it needs to be a serious problem to get you to stop your current work to go back and redo something you’ve already finished.
Periodic reviews: If a review is not explicitly required for compliance, it may not need to be a gate for each code change. You might have regular, scheduled reviews, for example weekly, where people check through code changes since the last review. This can be especially potent as a group exercise since it creates conversations that help people learn and shape the team’s norms for coding.
Pipeline approvals: If your team uses a Continuous Delivery pipeline to deliver changes to production, you can include a stage that requires someone to authorize the change to progress. This is conceptually similar to a pull request in that it is a gate in the delivery process, but you place the gate after code integration and automated tests. Doing this means that a human only spends time reviewing code that has already been proven technically correct.
Figure 6: Review changes after they are integrated and tested
Pull requests differ from Continuous Integration in having a human review a code change after writing it but before integrating it with the mainline. This creates a delay in getting feedback from automated tests against fully integrated code.
With Continuous Integration, code is either reviewed as it is written (pairing), or after it is integrated and tested. Optimizing the loop for integrating and testing changes means you can run this loop more frequently. A more frequent coding and integration loop encourages developers to make smaller and more frequent commits, which improves quality and flow.
The second edition of Infrastructure as Code is out! Mostly. E-Books are available now (Amazon.com | Amazon.co.uk | Amazon.in | O’Reilly), while the dead-tree version is trundling across rails, roads, and sea lanes towards your local bookshop. I’m told to expect it out in January 2021.
This is super exciting for me, and I hope people find the new edition useful. I talk a bit about the book on the book page. I rewrote pretty much the entire book - 4 years is a long time in this field.
Most infrastructure projects I’ve been involved with have a script, or usually a set of scripts that act like a build tool for software projects. These are often implemented using Makefiles, shell scripts, batch scripts, Rakefiles, or languages like Python and Ruby.
These project orchestration scripts do many jobs, depending on the project. Some of the jobs include:
Assemble and package project code for use. This might include pulling libraries and other dependencies. It could even involve downloading the infrastructure tools and packaging everything as a container image, creating an executable project.
Run static tests and possibly other offline tests (for example, using tools like Localstack) on the code outside the context of an instance of the infrastructure.
Assemble configuration values for a given instance of the infrastructure. These values might come from configuration files, parameter registries, existing infrastructure, or a combination of these.
Execute the infrastructure tool for an instance. This includes running the plan command for tools that support it and creating, changing, and destroying infrastructure.
Orchestrate commands across multiple infrastructure components and projects. For example, if different parts of an environment are built from different Terraform projects, the script might run commands for each project in the correct order, based on the dependencies between them.
Run tests against an instance of the infrastructure.
Many infrastructure project orchestration scripts handle a combination of these jobs. This tends to create messy, complicated code. Any code, including orchestration scripts, should follow good software design principles, including SOLID, DRY, and Separation of Concerns. Orchestration scripts should separate the different jobs and concerns into different parts, rather than having a master script that knows all. The Unix philosophy applies here.
Another issue with many infrastructure project scripts is that they are snowflakes, custom-built for each project. The script code often embeds knowledge of the projects it orchestrates, such as dependencies between projects and the names of configuration parameters each project needs. And team members spend considerable time and energy designing, implementing, and fixing their unique system of scripts.
I don’t believe there is value in building and maintaining unique scripts for an infrastructure project. Most of the differences in infrastructure build projects I’ve seen don’t come from meeting the project’s specific needs, but rather from the specific knowledge and preferences of the people who built the project.
So I’m interested in standardized tools to orchestrate infrastructure projects. I’d like to see opinionated tools that prescribe how to structure directories, manage configuration values, and integrate multiple projects. The challenge is finding a tool with the right opinions, “right” meaning I agree with them!
I’ll save elucidating the opinions I would agree with for another post. For now, here’s a list of tools that I’m aware of. At this point, I haven’t looked at these close enough to compare them with my own opinions about infrastructure project design.
Orchestration tools for Terraform
- Astro, a tool for managing multiple Terraform executions as a single command. Seems to focus on wiring Terraform modules together.
- Rake Terraform, libraries for running Terraform from Rake tasks. A part of the Infrablocks project
- Tau, Terraform Avinor Utility, another tool that orchestrates Terraform modules and configuration.
- Terragrunt, a thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state.
- Terraspace is an opinionated, convention over configuration tool that provides a project layout and handles configuration and integration of multiple projects.
- Terraform Scaffold orchestrates Terraform modules and configuration across multiple environments and components on AWS.
- Terranova - library to help you write golang code that implements terraform commands without the binary. So you can combine project orchestration and your infrastructure definitions, which sounds like an invitation to write code that spectacularly fails to separate concerns. But the possibilities are intriguing.
Orchestration tools for CloudFormation
There must be more of these than I know of. I’ve listed a couple that aren’t current but could be interesting.
- Rain, a development workflow tool for working with AWS CloudFormation. Currently in preview, not production-ready.
- Autostacker24, a Ruby utility to manage AWS CloudFormation stacks. I may or may not have been present for this tool’s conception, including suggesting the name. I’m not sure how active development is.
- cfnassist, a cloud formation helper tool. Not very active development.
I’ve delivered the second edition of the book to O’Reilly’s production department, and the wheels are turning to have it available by the end of the year. See the Book page for details on pre-ordering.
Why I wrote the first edition
The benefits of infrastructure as code don’t come from the tools themselves. They come from how you use them. The trick is to leverage the technology to embed quality, reliability, and compliance into the process of making changes.
I wrote the first edition of this book because I didn’t see a cohesive collection of guidance on how to manage infrastructure as code. There was plenty of advice scattered across blog posts, conference talks, and documentation for products and projects. But you needed to sift through everything and piece a strategy together for yourself, and most people didn’t have the time.
The experience of writing the first edition was amazing. It gave me the opportunity to travel and talk with people around the world about their own experiences. These conversations gave me new insights and exposed me to new challenges. I learned that the value of writing a book, speaking at conferences, and consulting with clients is that it fosters conversations. As an industry, we are still gathering, sharing, and evolving our ideas for managing infrastructure as code.
Why a second edition
Things have moved along since the first edition came out in June 2016. That edition was subtitled “managing servers in the cloud,” which reflects that most infrastructure automation until that point focused on configuring servers. Since then, containers and clusters have become a much bigger deal, and the infrastructure action has moved to managing collections of infrastructure resources provisioned from cloud platforms, what I (and many but not all other people) call stacks.
So the new edition talks a lot more about building stacks, the remit of tools like CloudFormation, Terraform, and Pulumi.
I’ve changed quite a bit based on what I’ve learned about the evolving challenges and needs of teams building infrastructure. As I’ve already touched on, I see making it safe and easy to change infrastructure as the key benefit of infrastructure as code. I believe people underestimate the importance of this, thinking that infrastructure is something you build and forget.
But too many teams I meet struggle to meet the needs of their organizations, not able to expand and scale quickly enough, support the pace of software delivery, or provide the reliability and security expected. And when we dig into the details of their challenges, it’s that they are overwhelmed by the need to update, fix, and improve their systems. So I’ve doubled down on this as the core theme of the second edition.
The new edition introduces three core practices for using Infrastructure as Code to make changes safely and easily. Define everything as code is obvious from the name, and creates repeatability and consistency. Continuously integrating, testing, and delivering each change enhances safety. It also makes it possible to move faster and with confidence. Small, independent pieces are easier and safer to change than larger pieces.
These three practices are mutually reinforcing. Code is easy to track, version, deliver across stages of a change management process. It’s easier to continuously test smaller pieces. Continuously testing each piece on its own forces you to keep a loosely coupled design.
These practices and the details of how to do them are familiar from the world of software development. I drew on agile software engineering and delivery practices for the first edition of the book. For the new edition I’ve also drawn on rules and practices for effective design.
In the past few years I’ve seen teams struggle with larger and more complicated infrastructure systems, and seen the benefits of applying lessons learned in software design patterns and principles. So I’ve included several chapters on how to do this.
I’ve also seen that organizing and working with infrastructure code is difficult for many teams, so I’ve addressed various pain points I’ve seen. How to keep codebases well organized, provide development and test instances for infrastructure, and manage collaboration of multiple people, including those responsible for governance.
I don’t believe we’ve matured as an industry in how we manage infrastructure. I’m planning to write a bit more on this blog and elsewhere on what I see as ways we can do better. I’m also hoping to assemble examples of infrastructure code that illustrate how to do this.
Blog posts about “What is DevOps” are a dime a dozen. I find myself repeating my 0.8 cent version of this, and other buzzwords that people knock around these days. So I figured I’d throw my thoughts onto the pile.
DevOps is about integrating the flow of work across development and operations. Tooling, technology, and practices can help you do this - cloud, Infrastructure as Code, and Continuous Delivery come to mind. Culture is essential to make sure that people align themselves and work in ways that do make the flow smooth. Organizations that adopt the tools without the culture fail to get the benefits of DevOps. I recommend Effective Devops by Jennifer Davis and Ryn Daniels. The DORA research is essential reading.
You build it, you run it is the idea that the people who build a thing own it in production. This structure is one way to address the cultural alignment aspect of DevOps, although it’s not the only way. I suggest looking into Team Topologies by Matthew Skelton and Manuel Pais for more.
Infrastructure as Code is an approach to defining and building systems that draws from software development practices. It gives you ways to safely empower application teams to define the infrastructure for their applications and to create consistent implementation and governance across environments. For more on this, I recommend, well, the book I’m rewriting on the topic.
GitOps is (in my simplistic view) using branches in source control as artifacts for a Continuous Delivery pipeline for Infrastructure as Code. In many implementations, it’s also about pull-based changes - something watches the branches and applies changes to the corresponding environment when they change. WeaveWorks has pioneered the concept. Although they tend to focus on using it for Kubernetes clusters, I see people using the idea - or at least, the term - in other contexts.
Observability is about giving developers a better view of what their code is doing in production. It has, I guess inevitably, been co-opted by vendors as a hip new label for their monitoring and log aggregation products. Honeycomb.io is the leading champion for observability. You should read anything and everything Charity Majors says about observability.