The second edition of Infrastructure as Code is out! Mostly. E-Books are available now (Amazon.com | Amazon.co.uk | Amazon.in | O’Reilly), while the dead-tree version is trundling across rails, roads, and sea lanes towards your local bookshop. I’m told to expect it out in January 2021.
This is super exciting for me, and I hope people find the new edition useful. I talk a bit about the book on the book page. I rewrote pretty much the entire book - 4 years is a long time in this field.
Most infrastructure projects I’ve been involved with have a script, or usually a set of scripts that act like a build tool for software projects. These are often implemented using Makefiles, shell scripts, batch scripts, Rakefiles, or languages like Python and Ruby.
These project orchestration scripts do many jobs, depending on the project. Some of the jobs include:
Assemble and package project code for use. This might include pulling libraries and other dependencies. It could even involve downloading the infrastructure tools and packaging everything as a container image, creating an executable project.
Run static tests and possibly other offline tests (for example, using tools like Localstack) on the code outside the context of an instance of the infrastructure.
Assemble configuration values for a given instance of the infrastructure. These values might come from configuration files, parameter registries, existing infrastructure, or a combination of these.
Execute the infrastructure tool for an instance. This includes running the plan command for tools that support it and creating, changing, and destroying infrastructure.
Orchestrate commands across multiple infrastructure components and projects. For example, if different parts of an environment are built from different Terraform projects, the script might run commands for each project in the correct order, based on the dependencies between them.
Run tests against an instance of the infrastructure.
Many infrastructure project orchestration scripts handle a combination of these jobs. This tends to create messy, complicated code. Any code, including orchestration scripts, should follow good software design principles, including SOLID, DRY, and Separation of Concerns. Orchestration scripts should separate the different jobs and concerns into different parts, rather than having a master script that knows all. The Unix philosophy applies here.
Another issue with many infrastructure project scripts is that they are snowflakes, custom-built for each project. The script code often embeds knowledge of the projects it orchestrates, such as dependencies between projects and the names of configuration parameters each project needs. And team members spend considerable time and energy designing, implementing, and fixing their unique system of scripts.
I don’t believe there is value in building and maintaining unique scripts for an infrastructure project. Most of the differences in infrastructure build projects I’ve seen don’t come from meeting the project’s specific needs, but rather from the specific knowledge and preferences of the people who built the project.
So I’m interested in standardized tools to orchestrate infrastructure projects. I’d like to see opinionated tools that prescribe how to structure directories, manage configuration values, and integrate multiple projects. The challenge is finding a tool with the right opinions, “right” meaning I agree with them!
I’ll save elucidating the opinions I would agree with for another post. For now, here’s a list of tools that I’m aware of. At this point, I haven’t looked at these close enough to compare them with my own opinions about infrastructure project design.
Orchestration tools for Terraform
- Astro, a tool for managing multiple Terraform executions as a single command. Seems to focus on wiring Terraform modules together.
- Rake Terraform, libraries for running Terraform from Rake tasks. A part of the Infrablocks project
- Tau, Terraform Avinor Utility, another tool that orchestrates Terraform modules and configuration.
- Terragrunt, a thin wrapper that provides extra tools for keeping your configurations DRY, working with multiple Terraform modules, and managing remote state.
- Terraspace is an opinionated, convention over configuration tool that provides a project layout and handles configuration and integration of multiple projects.
- Terraform Scaffold orchestrates Terraform modules and configuration across multiple environments and components on AWS.
- Terranova - library to help you write golang code that implements terraform commands without the binary. So you can combine project orchestration and your infrastructure definitions, which sounds like an invitation to write code that spectacularly fails to separate concerns. But the possibilities are intriguing.
Orchestration tools for CloudFormation
There must be more of these than I know of. I’ve listed a couple that aren’t current but could be interesting.
- Rain, a development workflow tool for working with AWS CloudFormation. Currently in preview, not production-ready.
- Autostacker24, a Ruby utility to manage AWS CloudFormation stacks. I may or may not have been present for this tool’s conception, including suggesting the name. I’m not sure how active development is.
- cfnassist, a cloud formation helper tool. Not very active development.
I’ve delivered the second edition of the book to O’Reilly’s production department, and the wheels are turning to have it available by the end of the year. See the Book page for details on pre-ordering.
Why I wrote the first edition
The benefits of infrastructure as code don’t come from the tools themselves. They come from how you use them. The trick is to leverage the technology to embed quality, reliability, and compliance into the process of making changes.
I wrote the first edition of this book because I didn’t see a cohesive collection of guidance on how to manage infrastructure as code. There was plenty of advice scattered across blog posts, conference talks, and documentation for products and projects. But you needed to sift through everything and piece a strategy together for yourself, and most people didn’t have the time.
The experience of writing the first edition was amazing. It gave me the opportunity to travel and talk with people around the world about their own experiences. These conversations gave me new insights and exposed me to new challenges. I learned that the value of writing a book, speaking at conferences, and consulting with clients is that it fosters conversations. As an industry, we are still gathering, sharing, and evolving our ideas for managing infrastructure as code.
Why a second edition
Things have moved along since the first edition came out in June 2016. That edition was subtitled “managing servers in the cloud,” which reflects that most infrastructure automation until that point focused on configuring servers. Since then, containers and clusters have become a much bigger deal, and the infrastructure action has moved to managing collections of infrastructure resources provisioned from cloud platforms, what I (and many but not all other people) call stacks.
So the new edition talks a lot more about building stacks, the remit of tools like CloudFormation, Terraform, and Pulumi.
I’ve changed quite a bit based on what I’ve learned about the evolving challenges and needs of teams building infrastructure. As I’ve already touched on, I see making it safe and easy to change infrastructure as the key benefit of infrastructure as code. I believe people underestimate the importance of this, thinking that infrastructure is something you build and forget.
But too many teams I meet struggle to meet the needs of their organizations, not able to expand and scale quickly enough, support the pace of software delivery, or provide the reliability and security expected. And when we dig into the details of their challenges, it’s that they are overwhelmed by the need to update, fix, and improve their systems. So I’ve doubled down on this as the core theme of the second edition.
The new edition introduces three core practices for using Infrastructure as Code to make changes safely and easily. Define everything as code is obvious from the name, and creates repeatability and consistency. Continuously integrating, testing, and delivering each change enhances safety. It also makes it possible to move faster and with confidence. Small, independent pieces are easier and safer to change than larger pieces.
These three practices are mutually reinforcing. Code is easy to track, version, deliver across stages of a change management process. It’s easier to continuously test smaller pieces. Continuously testing each piece on its own forces you to keep a loosely coupled design.
These practices and the details of how to do them are familiar from the world of software development. I drew on agile software engineering and delivery practices for the first edition of the book. For the new edition I’ve also drawn on rules and practices for effective design.
In the past few years I’ve seen teams struggle with larger and more complicated infrastructure systems, and seen the benefits of applying lessons learned in software design patterns and principles. So I’ve included several chapters on how to do this.
I’ve also seen that organizing and working with infrastructure code is difficult for many teams, so I’ve addressed various pain points I’ve seen. How to keep codebases well organized, provide development and test instances for infrastructure, and manage collaboration of multiple people, including those responsible for governance.
I don’t believe we’ve matured as an industry in how we manage infrastructure. I’m planning to write a bit more on this blog and elsewhere on what I see as ways we can do better. I’m also hoping to assemble examples of infrastructure code that illustrate how to do this.
Blog posts about “What is DevOps” are a dime a dozen. I find myself repeating my 0.8 cent version of this, and other buzzwords that people knock around these days. So I figured I’d throw my thoughts onto the pile.
DevOps is about integrating the flow of work across development and operations. Tooling, technology, and practices can help you do this - cloud, Infrastructure as Code, and Continuous Delivery come to mind. Culture is essential to make sure that people align themselves and work in ways that do make the flow smooth. Organizations that adopt the tools without the culture fail to get the benefits of DevOps. I recommend Effective Devops by Jennifer Davis and Ryn Daniels. The DORA research is essential reading.
You build it, you run it is the idea that the people who build a thing own it in production. This structure is one way to address the cultural alignment aspect of DevOps, although it’s not the only way. I suggest looking into Team Topologies by Matthew Skelton and Manuel Pais for more.
Infrastructure as Code is an approach to defining and building systems that draws from software development practices. It gives you ways to safely empower application teams to define the infrastructure for their applications and to create consistent implementation and governance across environments. For more on this, I recommend, well, the book I’m rewriting on the topic.
GitOps is (in my simplistic view) using branches in source control as artifacts for a Continuous Delivery pipeline for Infrastructure as Code. In many implementations, it’s also about pull-based changes - something watches the branches and applies changes to the corresponding environment when they change. WeaveWorks has pioneered the concept. Although they tend to focus on using it for Kubernetes clusters, I see people using the idea - or at least, the term - in other contexts.
Observability is about giving developers a better view of what their code is doing in production. It has, I guess inevitably, been co-opted by vendors as a hip new label for their monitoring and log aggregation products. Honeycomb.io is the leading champion for observability. You should read anything and everything Charity Majors says about observability.
TL;DR: Go read the early release of the second edition of Infrastructure as Code on O’Reilly! This is the first eight chapters of what will probably be eighteen or so.
Four years ago I was close to finishing my book Infrastructure as Code. I felt like I was racing against the industry’s ability to innovate and improve infrastructure technology. At the time, the action was in server configuration - the book’s subtitle was Managing Servers in the Cloud. Docker, Kubernetes, and AWS Lambda were still new, few people were using them in production.
Since we published the book, I’m often asked whether infrastructure as code is relevant in the cloud-native world of serverless and service mesh. You might not be shocked to hear me say, “yes.” Even if you’re no longer worried about configuring packages and file permissions on virtual servers, you’re still better off using code to build your clusters and environments than building them by hand.
Looking over the first edition of the book takes me back to a different time. Most clients I worked with were building infrastructure on the cloud for the first time, and my ThoughtWorks colleagues and I were introducing them to automation as code. There is a lot of text in the first edition to help you explain to your skittish management why they shouldn’t fear public cloud.
The world now is different. The technical ecosystem is still in flux. But even the most risk adverse organizations - financial institutions, governments, healthcare organizations - are using public cloud to one degree or another. The question isn’t whether to use cloud and infrastructure as code, but how.
My typical client today already has an existing infrastructure codebase. Their challenge is that their system has sprawled into a complex morass of code. Tools have come a long way, but reusing, sharing, and organizing code is still not easy. People write complicated build scripts that are actually more fragile and confusing than the infrastructure code they apply. And automated testing for infrastructure is still a challenge.
Enter the second edition of Infrastructure as Code. What I first thought would be a gentle refresh has turned into an aggressive rewrite. Things are different, as I described above. But I’ve also spent a huge amount of time with many teams and people who are working with infrastructure projects. I’ve learned a lot since I wrote the first edition. I’ve learned about challenges, practices, and ideas for dealing with dynamic infrastructure. And I’ve learned better ways to communicate these.
So check out the early release, and please let me know what you think!