A dynamic infrastructure platform is a fundamental requirement for Infrastructure as Code. I define this as “a system that provides computing resources, particularly servers, storage, and networking, in a way that they can be programmatically allocated and managed.”
In practice, this most often means a public IaaS (Infrastructure as a Service) cloud like Amazon’s AWS, Google’s GCE, or Microsoft’s Azure. But it can also be a private cloud platform using something like OpenStack or VMware vCloud. A dynamic infrastructure platform can also be implemented with an API-driven virtualization system like VMware. These systems normally force your infrastructure management tools to explicitly decide where to allocation resources - which hypervisor instance to start a VM on, which storage pool to allocate a network share from, etc. But this is still compatible with Infrastructure as Code, because it’s all programmable.
Many organizations, including DevOps paragons like Etsy and Spotify, implement Infrastructure as Code on bare-metal, with no virtualization or cloud at all. Tools such as Cobbler or Foreman can be used to automatically provision physical servers, leveraging ILO (Integrated Lights Out) features of the server hardware.
The key characteristics needed from an infrastructure platform for Infrastructure as Code are:
A dynamic infrastructure platform must be programmable. An API makes it possible for scripts, software, and tools to interact with the platform. Even if you’re using an off-the-shelf tool like Terraform or Ansible to provision infrastructure, you’ll almost certainly need to write some custom scripting or tools here and there. So you should make sure the platform’s API has good support for scripting languages that your team is comfortable with. Keep in mind the difference between “good” support for the language, and just having a tickbox.
The dynamic infrastructure platform needs to allow resources to be created and destroyed immediately. You would think this is obvious, but it’s not always the case. Some managed hosting providers, and internal IT departments, offer services they call “cloud”, but which require raising tickets to get someone else to make it happen. The hosting platform needs to be able to fulfill provisioning requests within minutes, if not seconds.
Billing and budgeting also need to be structured to support on-demand, incremental charging. If you need to sign a contract, or issue a purchase order, in order to create a new server, then it’s not going to work. If adding a new server requires a commitment of more than an hour, it’s not going to work.
Also, if your “cloud” hosting provider charges you for the hardware you’ll be using, and then charges you for each VM you run, then you’re being taken advantage of. That’s not how cloud works.
Self-service takes the on-demand requirement, and adds a bit more. It’s not enough to be able to get resources like servers quickly, you need to be able to customize and tailor them yourself. You shouldn’t need to get someone else to approve how much RAM and how many CPU’s your server will have. You should be able to tweak and adjust these things on existing servers.
Specifying your environment’s details, and changing it, will actually be done in definition files (like a Terraform file), using the platform’s programmable API. So any arrangement where a central group does this for you isn’t going to work.
I like the analogy fo Lego bricks. A central IT group that manages your cloud for you is like buying a box of Lego bricks, but having the shop staff decide how to assemble them for you. It stops you from taking ownership of the infrastructure you use. You won’t be able to learn how to to shape your infrastructure to your own needs and improve it over time.
Worse is when a central IT team offers you a catalog of pre-defined infrastructure. This is like only being able to buy a Lego set that has already been built for you and glued together. You’ve got no ability to adjust and improve it. You often can’t even request a change, such as a newer version of a JVM. Instead, you have to wait for the central group to build and test a new standard offering.
What you want
Ultimately, your infrastructure platform needs to give you the ability to define your infrastructure in files, and have your tools provision and update that infrastructure. This reduces your reliance on an overworked central team, and ensures you can continuously improve and adapt your infrastructure to support the application you run on it as effectively as possible.
The thumbnail definition that I trot out for Infrastructure as Code is using development practices and tools to manage infrastructure. This sounds like a natural thing to do, if you’re defining your infrastructure in definitions files used by tools like Chef, Puppet, and Ansible. These files look like source code, and can be checked into Git or other VCS systems like source code.
But what are the actual benefits of treating your infrastructure this way? Configuring infrastructure by editing files in a VCS is a dramatically different way of working than the old-school alternatives - clicking in a GUI-driven configuration, or logging into servers and editing configuration files. To make this shift, and to really get the benefits from it, you need to be pretty clear on what you’re trying to get out of it.
The headline benefits of Infrastructure as Code are to be able to easily and responsibly manage changes to infrastructure. We’d like to be able to make changes rapidly, with low risk. And we’d like to keep doing this even as the size and complexity of the infrastructure grows, and as more teams are using our infrastructure.
The enemy of this goal is manually-driven processes. Manual steps to provision, configure, modify, update, and fix things are the most obvious things to eliminate. But manually-driven process and governance can be at least as big an obstacle to frequent, low-risk changes. This becomes especially difficult to handle as an organization grows.
So what kind of benefits should you see from a well-implemented Infrastructure as Code approach?
- Your IT infrastructure supports and enables change, rather than being an obstacle or a constraint for its users.
- Changes to the system are routine, without drama or stress for users or IT staff.
- IT staff spends their time on valuable things which engage their abilities, not on routine, repetitive tasks.
- Users are able to responsibly define, provision, and manage the resources they need, without needing IT staff to do it for them.
- Teams are able to easily and quickly recover from failures, rather than assuming failure can be completely prevented.
- Improvements are made continuously, rather than done through expensive and risky “big bang” projects.
- You find solutions to problems by implementing, testing, and measuring them, rather than by discussing them in meetings and documents.
(Photo by Sebastien Wiertz)
I’ve delivered the text of my book to O’Reilly’s production team, which means we’re on the path to publication! The last “early access” release should be available soon to people who have bought it (which you can still do from the O’Reilly Shop), and then the final release will be out in stores.
The final early access push will be pretty much the final content, only missing copyediting (spelling, grammar, etc.) and professionally designed graphics.
My employer, ThoughtWorks, is sponsoring a free download of three chapters of my upcoming book, “Infrastructure as Code”. These chapters focus on software engineering, testing, and Continuous Delivery practices for infrastructure. ThoughtWorks has a deep history in all of these areas, so they seemed like an appropriate group of chapters for us to sponsor as a company.
I’ve now completed the full draft of the book. We’re getting technical reviews of the book, and have started getting the diagrams professionally designed. It’s awesome to see my crude attempts at diagrams turned into clean, slick images!
Last week I gave a talk at the operability.io conference. This was a great conference, small (about 170 people), single track. I met a lot of people I know, and a number of people who weren’t the usual DevOpsDays suspects. It had a strong focus on operations, with some excellent talks about organizing and running ops teams, as well as technical topics like logging and security. It probably leaned more towards people-oriented topics than technical-oriented ones.
My own talk, “Automating for Agility, was high level. I wanted to explore the importance of understanding and communicating the outcomes you expect to get from infrastructure automation. In my mind, there are two existential reasons for an organization to consider IT automation. One is to enable fast and continuous change. The other is to empower users of the infrastructure to achieve their goals.
I don’t believe most IT organizations today have either of these goals in mind. There are plenty who pay lip service to self-service for their users, but few who really deliver. In most cases, centralized platform and tool teams make decisions based on what is convenient for themselves, not for their users. They choose tools which help them, as a centralized team, have control over the solution.