Why Infrastructure as Code

question mark The thumbnail definition that I trot out for Infrastructure as Code is using development practices and tools to manage infrastructure. This sounds like a natural thing to do, if you’re defining your infrastructure in definitions files used by tools like Chef, Puppet, and Ansible. These files look like source code, and can be checked into Git or other VCS systems like source code.

But what are the actual benefits of treating your infrastructure this way? Configuring infrastructure by editing files in a VCS is a dramatically different way of working than the old-school alternatives - clicking in a GUI-driven configuration, or logging into servers and editing configuration files. To make this shift, and to really get the benefits from it, you need to be pretty clear on what you’re trying to get out of it.

The headline benefits of Infrastructure as Code are to be able to easily and responsibly manage changes to infrastructure. We’d like to be able to make changes rapidly, with low risk. And we’d like to keep doing this even as the size and complexity of the infrastructure grows, and as more teams are using our infrastructure.

The enemy of this goal is manually-driven processes. Manual steps to provision, configure, modify, update, and fix things are the most obvious things to eliminate. But manually-driven process and governance can be at least as big an obstacle to frequent, low-risk changes. This becomes especially difficult to handle as an organization grows.

So what kind of benefits should you see from a well-implemented Infrastructure as Code approach?

  • Your IT infrastructure supports and enables change, rather than being an obstacle or a constraint for its users.
  • Changes to the system are routine, without drama or stress for users or IT staff.
  • IT staff spends their time on valuable things which engage their abilities, not on routine, repetitive tasks.
  • Users are able to responsibly define, provision, and manage the resources they need, without needing IT staff to do it for them.
  • Teams are able to easily and quickly recover from failures, rather than assuming failure can be completely prevented.
  • Improvements are made continuously, rather than done through expensive and risky “big bang” projects.
  • You find solutions to problems by implementing, testing, and measuring them, rather than by discussing them in meetings and documents.

(Photo by Sebastien Wiertz)

To production and beyond

I’ve delivered the text of my book to O’Reilly’s production team, which means we’re on the path to publication! The last “early access” release should be available soon to people who have bought it (which you can still do from the O’Reilly Shop), and then the final release will be out in stores.

The final early access push will be pretty much the final content, only missing copyediting (spelling, grammar, etc.) and professionally designed graphics.

ThoughtWorks sponsors free chapters for download

My employer, ThoughtWorks, is sponsoring a free download of three chapters of my upcoming book, “Infrastructure as Code”. These chapters focus on software engineering, testing, and Continuous Delivery practices for infrastructure. ThoughtWorks has a deep history in all of these areas, so they seemed like an appropriate group of chapters for us to sponsor as a company.

I’ve now completed the full draft of the book. We’re getting technical reviews of the book, and have started getting the diagrams professionally designed. It’s awesome to see my crude attempts at diagrams turned into clean, slick images!

Talk at operability.io conference

Last week I gave a talk at the operability.io conference. This was a great conference, small (about 170 people), single track. I met a lot of people I know, and a number of people who weren’t the usual DevOpsDays suspects. It had a strong focus on operations, with some excellent talks about organizing and running ops teams, as well as technical topics like logging and security. It probably leaned more towards people-oriented topics than technical-oriented ones.

My own talk, “Automating for Agility, was high level. I wanted to explore the importance of understanding and communicating the outcomes you expect to get from infrastructure automation. In my mind, there are two existential reasons for an organization to consider IT automation. One is to enable fast and continuous change. The other is to empower users of the infrastructure to achieve their goals.

I don’t believe most IT organizations today have either of these goals in mind. There are plenty who pay lip service to self-service for their users, but few who really deliver. In most cases, centralized platform and tool teams make decisions based on what is convenient for themselves, not for their users. They choose tools which help them, as a centralized team, have control over the solution.

Update to early access Infrastructure as Code book

O’Reilly has pushed the latest update to the “early access” (i.e. rough draft) of my book-in-progress, Infrastructure as Code. You can buy the book now, getting access to the early access book, and you’ll get the full electronic version of the final release.

This release has three new chapters, and updates to some of the earlier chapters.

Chapter 8 discusses patterns and practices for making changes to servers. This chapter is closely tied to Chapter 3, server management tools, which I had released previously. The process of writing the new chapter led me to reshape the earlier one, in order to get the right balance of which topics belong in which chapter.

Basically, the chapters in Part I, including chapter 3, are intended to lay out how the different types of tools work. The chapters in Part II, including chapter 8, get more into patterns and practices for using the tools as part of an “infrastructure as code” approach.

Aside from improving the structure of the content, this revision to these two chapters clarified for me the idea of four “models” for server change management: ad-hoc changes, continuous synchronization, immutable servers, and containerization.

The other two new chapters in this release kick off Part III of the book, which gets more into the meat of software development practices that are relevant to infrastructure. Chapter 9 describes software engineering practices, drawing heavily on XP (eXtreme Programming) concepts like CI (Continuous Integration). It also discussed practical topics like effective use of VCS (Version Control Systems), including branching strategies, and maintaining code quality and avoiding technical debt.

Chapter 10 is about quality management. This includes obvious topics like TDD (Test Driven Development), but also goes into change management processes, such as CABs (Change Advisory Boards), and structuring work as stories. This is intended to take a higher level view of quality than simply writing automated tests. Automated tests are only one element of a larger strategy for managing changes to infrastructure in a way that helps teams make frequent, rapid changes, while avoiding errors and downtime.

One of the things I’m learning from the process of writing a full-length book is how interrelated the different parts of the book are. Writing new chapters forces me to re-evaluate what I’ve written previously, and re-think how it all fits together.

A key part of the process of improving the book as I write it is feedback from readers. I’m interested in hearing what people who are new to the ideas in this book think. Does the stuff the book says make sense? Is it relevant? Helpful? Confusing? I’m also interested in input from people who’ve already lived infrastructure as code. Do the principles and practices I’ve laid out resonate with you? Do you have different experiences? Are there topics I’ve missed out?