<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://infrastructure-as-code.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://infrastructure-as-code.com/" rel="alternate" type="text/html" /><updated>2026-06-05T10:24:56+01:00</updated><id>https://infrastructure-as-code.com/feed.xml</id><title type="html">Infrastructure as Code</title><subtitle>Website to share content for the O&apos;Reilly book Infrastructure as Code, whose topics include cloud architecture, infrastructure design, infrastructure codebase management, infrastructure delivery lifecycle and workflows, infrastructure automation tools, infrastructure platforms, and infrastructure orchestration. Composable infrastructure is a particularly recommended approach. The site includes guidance, principles, patterns, practices, examples, and techniques.</subtitle><author><name>Kief Morris</name></author><entry><title type="html">Craft Conference 2026 Talk Notes</title><link href="https://infrastructure-as-code.com/conferences/2026-CraftConference.html" rel="alternate" type="text/html" title="Craft Conference 2026 Talk Notes" /><published>2026-06-04T10:20:20+01:00</published><updated>2026-06-04T10:20:20+01:00</updated><id>https://infrastructure-as-code.com/conferences/craftconference-2026</id><content type="html" xml:base="https://infrastructure-as-code.com/conferences/2026-CraftConference.html"><![CDATA[<p><a href="/resources/Talk-Craft-Conference-2026-Kief-Morris.pdf">Slides for my talk on Pulling Continuous Delivery inside the agentic loop</a></p>

<h2 id="description">Description</h2>

<blockquote>
  <p>Remember when one team would build software, then hand it off to another team to deploy it and get it working in production? I seem to recall we &gt; came up with better ways to deliver software. We even made up cool buzzwords like “DevOps” and “Continuous Delivery.”</p>

  <p>Many years later, I see people using LLMs to iterate on building an application and treating production readiness as an afterthought. That might &gt; be fine for demos and personal projects. But if we’re going to use agents to build real, business-critical software, we need to use agents to &gt; make sure the software is fit for purpose. We need to know that the software and its infrastructure performs, scales, recovers when things go &gt; wrong, and stays secure and compliant.</p>

  <p>I maintain that continuously ensuring software is production-ready as it’s developed is at least as important when using agents as when we &gt; hand-code it. I’ll talk about how to pull the path to production inside the agentic development flow. And I’ll share why doing this kind of blew up on me the first time, and how I had to adjust my thinking to make it work.</p>
</blockquote>

<h2 id="some-of-the-influences-for-this-talk">Some of the influences for this talk</h2>

<p><a href="https://martinfowler.com/articles/harness-engineering.html">Harness engineering for coding agent users</a>, Birgitta Böckeler</p>

<p><a href="https://www.youtube.com/watch?v=uLWOLmeHOSE">Harness engineering beyond skills: Using sensors to keep your coding agent in check</a>, (Video) Birgitta Böckeler, Chris Ford</p>

<p><a href="https://www.chrismdp.com/coding-with-ai/">How I Use AI to Code</a>, Chris Parsons</p>

<p><a href="https://www.adamhjk.com/blog/as-we-build-so-we-believe/">As we build, so we believe</a>, Adam Jacob</p>

<p><a href="https://stack72.dev/skills-are-context-and-context-needs-tests/">Skills Are Context, and Context Needs Tests</a>, Paul Stack</p>

<h2 id="credits-for-photos-used-in-the-talk">Credits for photos used in the talk:</h2>

<ul>
  <li><a href="https://unsplash.com/@homaappliances?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Homa Appliances</a> on <a href="https://unsplash.com/photos/blue-industrial-robot-arm-in-factory-sz1CHL7Pky0?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></li>
  <li><a href="https://unsplash.com/@hyundaimotorgroup?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Hyundai Motor Group</a> on <a href="https://unsplash.com/photos/a-factory-filled-with-lots-of-machines-and-boxes-h2rWePLKxvs?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></li>
  <li><a href="https://unsplash.com/@tama66?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Peter Herrmann</a> on <a href="https://unsplash.com/photos/white-and-brown-building-interior-z6DJJZ1-1Cg?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></li>
  <li><a href="https://unsplash.com/@kayticloudkicker?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Kathrine Coonjohn</a> on <a href="https://unsplash.com/photos/a-rainbow-in-the-sky-over-a-lake-and-mountains-e3KsXuD7w1w?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></li>
  <li><a href="https://unsplash.com/@abhinav1bhardwaj?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Abhinav Bhardwaj</a> on <a href="https://unsplash.com/photos/a-close-up-of-a-jet-engine-on-display-nxB-ogVt3rQ?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a> (also the image used for this post)</li>
</ul>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[Slides and links from my talk, _Pulling Continuous Delivery inside the agentic loop_, from Craft Conference 2026.]]></summary></entry><entry><title type="html">PlatformCon 2026 Talk</title><link href="https://infrastructure-as-code.com/conferences/2026-PlatformCon.html" rel="alternate" type="text/html" title="PlatformCon 2026 Talk" /><published>2026-05-14T10:20:20+01:00</published><updated>2026-05-14T10:20:20+01:00</updated><id>https://infrastructure-as-code.com/conferences/platformcon2026</id><content type="html" xml:base="https://infrastructure-as-code.com/conferences/2026-PlatformCon.html"><![CDATA[<p>I’m thinking a lot lately about how we use agents in the path to production, including infrastructure engineering and managing operational quality. I’m speaking on this at <a href="https://platformcon.com/">PlatformCon 2026</a>. You can download the <a href="resources/Talk-Agents-End-to-End-20250601.pdf">slides for my humans on the loop presentation</a> now.</p>

<h2 id="some-of-the-influences-for-this-talk">Some of the influences for this talk</h2>

<p><a href="https://martinfowler.com/articles/harness-engineering.html">Harness engineering for coding agent users</a>, Birgitta Böckeler</p>

<p><a href="https://www.youtube.com/watch?v=uLWOLmeHOSE">Harness engineering beyond skills: Using sensors to keep your coding agent in check</a>, (Video) Birgitta Böckeler, Chris Ford</p>

<p><a href="https://www.chrismdp.com/coding-with-ai/">How I Use AI to Code</a>, Chris Parsons</p>

<p><a href="https://www.adamhjk.com/blog/as-we-build-so-we-believe/">As we build, so we believe</a>, Adam Jacob</p>

<p><a href="https://stack72.dev/skills-are-context-and-context-needs-tests/">Skills Are Context, and Context Needs Tests</a>, Paul Stack</p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[Slides and links from my talk _Humans on the Loop: Taking agents end do end_, which I am delivering at PlatformCon 2026. How does CD work with agentic software engineering?]]></summary></entry><entry><title type="html">Why I have my agents write infrastructure code</title><link href="https://infrastructure-as-code.com/why-i-use-infra-code-with-agents.html" rel="alternate" type="text/html" title="Why I have my agents write infrastructure code" /><published>2026-04-14T11:00:00+01:00</published><updated>2026-04-14T11:00:00+01:00</updated><id>https://infrastructure-as-code.com/agents-and-iac</id><content type="html" xml:base="https://infrastructure-as-code.com/why-i-use-infra-code-with-agents.html"><![CDATA[<p>AI agents build software quickly, but they don’t default to building systems with great operational quality. Using agents to build the infrastructure that hosts software feels especially risky. But we need to incorporate infrastructure into agentic software delivery workflows not only to keep up with the pace, but to handle rapidly increasing pressure on reliability, security, and scalability.</p>

<p>I’m experimenting with how to use agents to build AWS infrastructure to run a sandboxed instance of <a href="https://openclaw.ai/">OpenClaw</a> for myself. It’s a nice dogfooding project that I’m actively relying on and needs constant work to keep up with frequent software releases and changes as I add new capabilities and data.</p>

<p>I could just fire up Claude Code with AWS credentials, give it a prompt or spec for the infrastructure I need to run OpenClaw, and let it loose with the AWS CLI - “vibe infrastructure engineering”. Claude is happy to crack on with it. The agent will make mistakes along the way, but it knows the outcome I want and will iterate on the implementation until it gets there.</p>

<p>The challenge is that my coding agent does a messy job of it. It takes multiple tries to find a command that works and casually blows away data and application configuration. When I ask for a minor change it replaces a solid, efficient implementation with a slow, less-reliable bodge. At one point it started implementing a Slack gateway, something already built into OpenClaw. Working with an AI agent reminds me of working with an old-school system administrator who likes to build everything by hand.</p>

<h2 id="code-for-consistency">Code for Consistency</h2>

<p>Infrastructure code gives me consistency. I define the set of resources I need and how to assemble them. Then I can build, rebuild, and replicate the system, and know it’s essentially the same. Infrastructure code is an anchor for my coding agent. Rather than starting from scratch for every change, the agent starts with the existing code, and I guide it to incrementally improve the quality of the infrastructure. The agent can also use existing code as a reference for new infrastructure with similar requirements, so it will thrash less deciding how to build it.</p>

<p>Having code helped me move from my original OpenClaw build to a new AWS account with additional capabilities. I built it using <a href="https://github.com/systeminit/swamp">Swamp</a> as my infrastructure coding tool. I decided to try Swamp because it’s written from the ground up to be used by AI coding agents. I had Claude build the new instance infrastructure by copying the relevant Swamp models and workflows from the original repository. Starting with a specification or export of the existing AWS resource structure would have needed much more time to work out a new implementation.</p>

<h2 id="code-for-confidence">Code for Confidence</h2>

<p>One of the reasons I’ve always liked using code to build infrastructure is that it gives me confidence when making a change, which helps me to make changes faster and more often.</p>

<p>I can inspect and run checks on the code before deploying it. I can deploy a disposable replica and run active tests to make sure it works the way I want. I can test the operational tolerances of my system by running stress tests, failure scenarios, and attacks on a replica. And if I still end up hosing my production instance, I can quickly rebuild it, presuming I’ve also automated data and configuration persistence.</p>

<p>The confidence I get from using code to manage my infrastructure makes it less scary to change it. So I can knock out fixes, improvements, updates, and new capabilities whenever I need to, rather than leaving them on a backlog to get around to. For years this has served me well working with fast-paced human software delivery teams. Agentic fire-hoses of software and infrastructure code make confidence in the reliability of the workflow even more important.</p>

<p>Infrastructure code by itself isn’t enough to make agentic infrastructure as reliable as I need it to be. I’ll go into more details of creating an agentic harness for infrastructure in a separate post, building on what my colleague Birgitta Böckeler has been writing about <a href="https://martinfowler.com/articles/harness-engineering.html">harness engineering</a>.</p>

<h2 id="code-for-composability">Code for Composability</h2>

<p>The benefit of using infrastructure code with agents that I’m particularly excited about is being able to reuse infrastructure implementations.</p>

<p>The Infrastructure as Code ecosystem includes many attempts at shareable libraries, but we’ve never achieved anything close to what operating system and programming language packaging systems have. Maybe infrastructure code is too brittle, or organizations’ requirements are too idiosyncratic. But unlike static libraries, agents can adapt code to specific needs and context. As <a href="https://mitchellh.com/writing/building-block-economy">Mitchell Hashimoto</a> and <a href="https://www.adamhjk.com/blog/adaptive-building-blocks/">Adam Jacob</a> have pointed out, agents are particularly good at assembling pre-existing components.</p>

<p>If a coding agent can adapt working, pre-built code to the needs of a specific workload in a specific organization’s context, then we may have the basis for a healthy ecosystem of infrastructure components. Even as models get better at writing infrastructure code that works on the first try, having pre-built components will be faster and more consistent. Off-the-shelf infrastructure components, combined with a robust delivery harness that enforces compliance and operational standards, can empower developers to build and deploy production-grade systems, something I previously described as <a href="/posts/devops-noops.html">the return of NoOps</a>.</p>

<p><small><em>Photo by <a href="https://unsplash.com/@georgeiermann?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Georg Eiermann</a></em></small></p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[I could have Claude Code directly create infrastructure for me. But I prefer to have it write infrastructure code, for consistency, confidence, and composability.]]></summary></entry><entry><title type="html">The Humans On The Loop</title><link href="https://infrastructure-as-code.com/humans-on-the-loop.html" rel="alternate" type="text/html" title="The Humans On The Loop" /><published>2026-03-09T06:20:20+00:00</published><updated>2026-03-09T06:20:20+00:00</updated><id>https://infrastructure-as-code.com/humans-on-the-loop</id><content type="html" xml:base="https://infrastructure-as-code.com/humans-on-the-loop.html"><![CDATA[<p>I spent last week in Bengaluru with the Doppler team, working on volume 34 of the <a href="https://www.thoughtworks.com/radar">Thoughtworks Technology Radar</a>. It was an intense but fascinating process, especially given how quickly things are changing in the industry right now. The new edition will be published in April.</p>

<h2 id="reactions-to-my-previous-article">Reactions To My Previous Article</h2>

<p>I wrote recently that the <a href="/posts/devops-noops.html">next generation of infrastructure tools is for developers</a>, and that agentic software engineering may help make this transition real. I deliberately used the term “NoOps” to echo an early DevOps-era argument: operations work doesn’t disappear, but the role changes.</p>

<p>Back in 2012 “NoOps” triggered backlash it sounded like there would be no need for ops folks when infrastructure was automated. But the point was different. We needed to shift from manually handling every release to building and managing the systems that handle releases.</p>

<p>At the time, that shift showed up through Continuous Delivery and related operational practices that later evolved into what we now call platform engineering.</p>

<p>I’m using “NoOps” in that same sense now. Not “ops disappears,” but “ops changes shape.” I expected a reaction.
Ops folks are proud of our craft, and rightly so. Reliable operations is hard, and not something developers can do casually with a magical <a href="https://en.wikipedia.org/wiki/Stochastic_parrot">stochastic parrot</a>.</p>

<p>Some response has dismissed this as “AI = vibe coding slop,” but that misses the point. I’ve been making this argument since long before LLMs: many ops teams still hand-craft infrastructure, even when they’re using automation tools. They are still a manual step for routine changes.</p>

<p>Agentic workflows create a chance to push self-service further. Not by encouraging YOLO changes, but by having experts build the platforms, guardrails, and workflows developers can safely use.</p>

<p>I think software delivery workflows will look very different within a few years, especially in higher-performing teams. Infrastructure professionals will still be essential. But our value will be in designing the loop, not being the loop.</p>

<h2 id="new-article-humans-and-agents-in-software-engineering-loops">New Article: Humans and Agents in Software Engineering Loops</h2>

<p>That leads to the question I wanted to explore next: where do humans fit in an agentic delivery workflow?</p>

<p>I wrote <a href="https://www.martinfowler.com/articles/exploring-gen-ai/humans-and-agents.html">Humans and Agents in Software Engineering Loops</a> to work through that.</p>

<p>The key issue is how we govern how agents build a system. Do we give no guidance (“vibe coding”)? Do humans manually inspect every line (“humans in the loop”)? Or do we build systems that shape both outcomes and behavior (“humans on the loop”)?</p>

<p>For anyone who lived through the rise of DevOps, IaC, and Continuous Delivery, this should feel familiar. We stopped patching release artifacts by hand and improved the delivery system itself — tests, checks, and automation — so the next change would be safer by default.</p>

<p>Agentic software engineering looks familiar to me. We’ll build and run systems that produce, deliver, and operate software, including infrastructure code.</p>

<p>Some people argue this won’t work because LLMs are unreliable. But human engineers aren’t deterministic either. We’ve always managed that with a mix of deterministic and non-deterministic elements. That doesn’t go away.</p>

<h2 id="tools-worth-watching">Tools Worth Watching</h2>

<p>Infrastructure as Code vendors are starting to adapt for agentic workflows. They’re making their tools easier for agents to use safely and repeatedly in engineering loops.</p>

<p>A few examples worth paying attention to:</p>

<p><em><a href="https://github.com/hashicorp/agent-skills/tree/main/terraform">HashiCorp Agent Skills</a>:</em></p>

<p>Skills for generating Terraform code, refactoring modules, working with stacks, and writing tests (plus Packer support). A natural fit for teams happy in HashiCorp’s ecosystem. The open question is how well this fits teams using <a href="https://infrastructure-as-code.com/posts/interesting-tools.html">alternative packaging and deployment tools</a> with Terraform.</p>

<p><em><a href="https://www.pulumi.com/docs/ai/skills/">Pulumi skills</a>:</em> Skills focused on migration (Terraform/CDK/CloudFormation/ARM) and Pulumi authoring practices. Useful if you’re considering a move to Pulumi and want agent support during both migration and steady-state development.</p>

<p><em><a href="https://github.com/systeminit/swamp">Swamp</a>:</em> An agent-oriented CLI and workflow model built for AI-native infrastructure automation from the start. Strong candidate for teams designing end-to-end agentic workflows rather than bolting agents onto existing processes.</p>

<p><em><a href="https://github.com/platform-engineering-labs/formae">Formae</a>:</em> A newer IaC platform with strong emphasis on drift/convergence and explicit agent integration via MCP + skills. Particularly interesting for teams dealing with messy, mixed estates and out-of-band changes.</p>

<p>For most teams, this won’t be a single-tool decision. Terraform and Pulumi skills are pragmatic options for teams with an existing infrastructure codebase. Swamp and Formae are more opinionated, agent-first operating models.</p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[Agentic software engineering looks familiar to me. We'll manage systems that produce, deliver, and operate software, including infrastructure code. How do we govern how agents build a system? Do we give no guidance ("vibe coding")? Do humans manually inspect every line ("humans in the loop")? Or do we build systems that shape both outcomes and behavior ("humans on the loop")?]]></summary></entry><entry><title type="html">The return of NoOps: Let’s use AI to finish the job DevOps started.</title><link href="https://infrastructure-as-code.com/posts/devops-noops.html" rel="alternate" type="text/html" title="The return of NoOps: Let’s use AI to finish the job DevOps started." /><published>2026-02-09T04:29:01+00:00</published><updated>2026-02-09T04:29:01+00:00</updated><id>https://infrastructure-as-code.com/posts/devops-noops</id><content type="html" xml:base="https://infrastructure-as-code.com/posts/devops-noops.html"><![CDATA[<h2 id="weve-removed-the-devops-wall-but-the-devinfra-wall-remains">We’ve removed the dev/ops wall. But the dev/infra wall remains.</h2>

<p>DevOps has taken us a long way. Fifteen years ago a 6-month cycle for releasing software was considered pretty good. These days it’s considered pretty poor. But infrastructure is still a bottleneck for most teams releasing software. In most engineering organizations beyond a fairly small size, developers rely on another team to manage infrastructure for them.</p>

<p>Tools like Pulumi and CDK aimed to solve this by allowing developers to write infrastructure code in a familiar programming language, so they wouldn’t need to rely on specialists to do it for them. But it turns out that YAML and HCL aren’t the real obstacle. Knowing how to wire low-level infrastructure resources like network routes, security groups, gateways, and SSL certificates together to do something useful takes time and expertise.</p>

<p>Developers who gain this expertise and write the infrastructure code for non-trivial systems turn into infrastructure specialists. Some are happy to take that journey, many would prefer to focus on the software.</p>

<p>Platform engineering, Developer Experience, and a handful of other fairly recent movements aim to empower developers by making the software delivery fully self-service. But often, these names are applied to teams which still work by hand-coding infrastructure. Developer workflows for getting code into production still involve waiting for someone else to make changes to networking, databases, or compute clusters. I’ve previously described this antipattern as <a href="https://infrastructure-as-code.com/post/infrastructure-platform-teams.html">infrastructure management teams</a>.</p>

<h2 id="better-tools-for-infrastructure-experts-only-reinforce-the-wall">Better tools for infrastructure experts only reinforce the wall.</h2>

<p>Then there’s an emerging genre of tools that aim to solve the problems with Infrastructure as Code. Their creators believe that the biggest obstacle that needs to be removed is the clunkiness of managing code files in git and coping with the drift between code and deployed infrastructure. I have written about potential successors to Infrastructure as Code, including <a href="https://infrastructure-as-code.com/posts/infra-effectiveness-part1-new-tools.html#infrastructure-as-model">Infrastructure as Model</a> tools like System Initiative.</p>

<p>System Initiative spent the last 6 years building their contender for the future of infrastructure automation. Rather than code, it represented infrastructure resources as a graph that could be dynamically manipulated by whatever extensions or integrations you chose to build.</p>

<p>Nobody wanted it.</p>

<p>One theory of why System Initiative struggled for adoption is that their approach was “too weird.” But I believe the real issue was there wasn’t a strong value proposition. System Initiative was another tool for infrastructure experts to manage infrastructure for developers. If you need to wait a week for the platform team to set up the new environment, it doesn’t matter that they don’t need to edit YAML files. The wall remains.</p>

<h2 id="the-next-generation-of-infrastructure-tools-is-for-developers">The next generation of infrastructure tools is for developers.</h2>

<p>What we need is to empower developers to build and manage their own infrastructure. A developer should be able to specify what they need in terms that are relevant to the problem they’re solving. “I need this service to accept inbound HTTPS connections from the public Internet.” “I need an SQL database instance to store private user data, configured for security, availability, backups, and regulatory compliance.”</p>

<p>There was a brief moment back when DevOps was new where “<a href="https://perfcap.blogspot.com/2012/03/ops-devops-and-noops-at-netflix.html">NoOps</a>” was a hot topic. The term has faded, partly because people read it as declaring ops and infra specialists to be obsolete. If developers are going to be whipping up databases and network connectivity, someone needs to provide the frameworks to ensure that what gets deployed complies with policies and governance.</p>

<p>The real point of NoOps was that ops folks should provide tools and systems that remove themselves completely from the developer’s workflow. And that point is more relevant than ever.</p>

<p>The System Initiative team has shut down their original product and is pivoting to something that seems aligned with that point. Their founder, Adam Jacob, <a href="https://youtu.be/yxzghm3Fdj8?t=10718">recently gave a talk where he shared his epiphany about what comes next</a>. His message is that the increase in software development velocity from AI is going to force ops to follow, like it or not.</p>

<p>Adam and his team gave me an early build of their new product, swamp. <a href="https://www.systeminit.com/">Swamp</a> is a command-line tool designed to be understood and run by AI coding agents like Claude Code, so it fits naturally into the developer’s workflow.</p>

<p>I don’t know whether swamp itself will become the next big thing. Tools like <a href="https://www.pulumi.com/product/neo/">Pulumi Neo</a> are converging on a similar model. Especially for organizations using agentic development, bringing infrastructure management into the agent, embedded into the way developers build their software, has the potential to finally remove the dev/infra wall.</p>

<p><small><em>Photo by <a href="https://unsplash.com/@navymedicine?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Navy Medicine</a> on <a href="https://unsplash.com/photos/soldiers-climb-a-wall-using-a-human-ladder-formation-k6ibKit38Kk?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></em></small></p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[DevOps removed the wall between developers and operations, but the wall between developers and infrastructure remains. The next generation of infrastructure automation will use AI to remove that wall by embedding into the software developer's workflow.]]></summary></entry><entry><title type="html">DevOpsDays Istanbul 2025</title><link href="https://infrastructure-as-code.com/conferences/2025-11-DevOpsDaysIstanbul.html" rel="alternate" type="text/html" title="DevOpsDays Istanbul 2025" /><published>2025-10-31T01:20:20+00:00</published><updated>2025-10-31T01:20:20+00:00</updated><id>https://infrastructure-as-code.com/conferences/devopsdays-istanbul-2025</id><content type="html" xml:base="https://infrastructure-as-code.com/conferences/2025-11-DevOpsDaysIstanbul.html"><![CDATA[<p>You can download the slides from my talk, <a href="/resources/Talk-Self-Service-20251101.pdf">Next generation self-service models for infrastructure</a>, which I presented as the opening keynote for <a href="https://devopsdays.istanbul/">DevOpsDays Istanbul</a> on 1 November, 2025.</p>

<p>I’ve written a post, <a href="/posts/interesting-tools.html">Interesting Tools</a>, where I list tools that I think are interesting for implementing the ideas in the talk, and another, <a href="/post/infrastructure-platform-teams.html">Infrastructure Platform Teams</a>, where I discuss team topologies for self-service infrastructure.</p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[Slides from my presentation at DevOpsDays Istanbul on 1 November, 2025]]></summary></entry><entry><title type="html">Where is the value with infrastructure automation?</title><link href="https://infrastructure-as-code.com/posts/infra-effectiveness-part2-value.html" rel="alternate" type="text/html" title="Where is the value with infrastructure automation?" /><published>2025-06-16T05:00:00+01:00</published><updated>2025-06-16T05:00:00+01:00</updated><id>https://infrastructure-as-code.com/posts/infra-effectiveness-part2-value</id><content type="html" xml:base="https://infrastructure-as-code.com/posts/infra-effectiveness-part2-value.html"><![CDATA[<p><em>(Part 2 of The Infrastructure Automation Problem)</em></p>

<p>In my <a href="/posts/infra-effectiveness-part1-new-tools.html">previous article</a>, I observed that many engineering leaders find infrastructure management to be a bottleneck for software delivery, even after adopting cloud and infrastructure automation. Development teams are blocked waiting for environments to be built, updated, modified, extended, and fixed. There never seem to be enough environments, and yet cloud costs spiral. Messy and fragile environments are a bottleneck for developing and releasing changes. Infrastructure and platform teams are overstretched while technical debt piles up.</p>

<p>That article described some of the approaches tool vendors are taking to make it easier to work with infrastructure, moving past the limitations of the traditional Infrastructure as Code model. Using general purpose programming languages, embedding infrastructure code within application code, using dynamic models of infrastructure rather than static code, and using Generative AI all have the potential to make infrastructure more accessible to its users.</p>

<p>But as powerful as some of these approaches are, I believe that, in themselves, they don’t address the root issues that stop organizations from getting as much value as they should from cloud infrastructure. Why not? Because they focus on improving the efficiency of specific infrastructure management tasks rather than improving the end-to-end value streams that rely on infrastructure.</p>

<blockquote>
  <p>The value that infrastructure provides to an organization is not in the specific tasks of provisioning or updating infrastructure. It’s in the outcomes that are achieved by the direct and indirect users of the infrastructure.</p>
</blockquote>

<p>In this article I describe an approach for identifying what value organizations get from digital infrastructure and how to trace that to the specific infrastructure delivery capabilities. I discuss how analyzing value streams that depend on infrastructure capabilities, particularly software delivery value streams, can identify opportunities to remove bottlenecks. Then I touch on how different approaches for delivering infrastructure capabilities can help with those bottlenecks, such as self-service delivery.</p>

<p>This article lays the foundation for some follow-up content I’m planning that discusses how to implement infrastructure delivery capabilities that improve the end-to-end effectiveness of value streams that rely on infrastructure. The good news is, many of the tool vendors who are working on ways to improve the experience of working directly with infrastructure are also working on ways to improve delivery and outcomes. My goal with this series is to help with thinking about how to design, select, and implement solutions using a value-driven approach.</p>

<h1 id="does-infrastructure-have-business-value">Does infrastructure have business value?</h1>

<p>It’s worth repeating, the value of infrastructure management isn’t measured by how quickly infrastructure resources are deployed and changed; it’s measured by the outcomes delivered by those who use those resources. Digital infrastructure usually has multiple layers of stakeholders represented in a value chain with the organization’s customers at the end.</p>

<p>At that end of the value chain, where revenue is earned, the contribution of infrastructure is indirect and usually unclear. At the opposite end it can be hard to see exactly what value we get from each network route we configure and storage bucket we provision.</p>

<blockquote>
  <p>Unfortunately, infrastructure is most clearly visible to the business as cost and failure.</p>
</blockquote>

<p>This gap makes it tempting to dismiss decisions about infrastructure work as irrelevant to business value. It’s common for business leaders to assume that infrastructure is an undifferentiated utility akin to plumbing. Even many infrastructure engineers don’t see the need to understand the workloads that run on their systems, much less their organization’s strategy.</p>

<p>But if infrastructure didn’t need to be tailored to the organization then we wouldn’t have a constant stream of new buzzwords and movements popping up focusing on doing just that. DevOps, SRE, platform engineering, developer experience - these all continue to get attention because there is a layer of capabilities that needs to be built on top of the low-level IaaS cloud in order for software teams to deliver value specific to their organization.</p>

<p>The hyperscale cloud vendors have done a pretty good job of finding the line where infrastructure really is a utility and putting it behind IaaS APIs. Assembling the undifferentiated resources the APIs provide into useful services for your team to use turns out to need custom work, and plenty of it. Hordes of vendors of all sizes promise turnkey solutions to make that work disappear, and yet it still hasn’t disappeared.</p>

<p>So if cloud and platform vendors can’t give us a generic, one-size-fits-all solution for making the cloud directly useful to our organization, then we need to roll up our sleeves. Our first step is to work out what our organization, specifically, needs from our infrastructure.</p>

<h1 id="what-value-does-infrastructure-deliver-for-your-organization">What value does infrastructure deliver for your organization?</h1>

<p>A good place to start understanding how to get the most value from infrastructure automation is to look at the organization’s strategy, goals, and plans, and think about where infrastructure plays a role. Here are a few examples I’ve come across multiple times with clients:</p>

<p><strong>The business grows and sustains revenue</strong> by delivering new digital products and services, and delivering new features, improvements, and fixes to existing products. Infrastructure plays a key role in making sure that software delivery teams have what they need to develop and deliver code into production quickly, reliably, and safely.</p>

<p><strong>The company expands into markets</strong> by deploying existing products to new geographical regions and new customers. These different types of expansion often involve building and maintaining new production infrastructure.</p>

<p><strong>The organization manages costs and improves operational effectiveness</strong> by consolidating digital services created by different teams, including acquired businesses. This is often a pure infrastructure play, such as working out how to reduce eight different Kubernetes implementations to a more reasonable number.</p>

<p>These activities can usually be mapped to revenues and costs, which makes it possible to attribute financial value to high-level infrastructure delivery capabilities. We can work out those high-level capabilities by looking at the value chain for a given business activity, analyzing the value streams that support that value chain, and then understanding which infrastructure delivery capabilities are most important for improving those value streams.</p>

<p>For example, an online travel business has a goal to increase revenues, which they can achieve by increasing the number of bookings. Product and UX research suggests bookings could be increased by adding a new AI feature for users searching for flights. The development team will implement the new feature in code, and by integrating with an external AI service.</p>

<p><img src="/images/infra-value-map.jpg" alt="A diagram showing business goals across the top, supported by a layer of value chain activities, supported by a layer of value streams, supported by infrastructure delivery capabilities" /></p>

<p>Delivering the new AI flight search feature depends on several infrastructure delivery capabilities, including providing test environments, adding a vector database to all of the environments including production, and changing network configurations to connect with the AI service.</p>

<p>This example shows how to trace the connection between providing an infrastructure delivery capability, like providing an environment to deploy and test a software build, to business goals. The impact of a given infrastructure capability can be assessed by understanding how it contributes to the effectiveness of the value streams it supports. So our next step is to examine one of those value streams.</p>

<h1 id="measuring-software-delivery-effectiveness">Measuring software delivery effectiveness</h1>

<p>A software delivery value stream traces the process for getting a change into production. The process of developing a new feature, like the travel company’s AI flight search, usually involves delivering a stream of changes into production over the course of weeks or even months. Early changes may be turned off in production, available only to testers and early adopters as development progresses. Once the feature is made available to end users, further changes iterate and evolve the feature based on how it performs commercially and user feedback.</p>

<p>Different types of software changes may be delivered using different value streams (or perhaps different branches of a complex stream). For the purposes of understanding infrastructure effectiveness, there are a number of types of streams that involve infrastructure in different ways. Some examples:</p>

<ul>
  <li>A software team develops and delivers a code change to production without requiring any changes to existing infrastructure</li>
  <li>A software team develops and delivers a code change to production that requires a change to existing infrastructure</li>
  <li>A product team creates a new software product or service</li>
  <li>A business team introduces existing services to a new customer, requiring adding at least some dedicated-tenancy production infrastructure</li>
</ul>

<p>The effectiveness of a software delivery value stream can be measured using frameworks like the DORA metrics and the DX platform. The key metrics, per DORA, are time to deliver a change, rate of delivery, production failure rates, and recovery time from production failures.</p>

<p>Once you have an idea of how effective your software delivery value streams are, value stream mapping exercises help to discover ways to improve it. These exercises involve gathering data from various systems and stakeholders to understand the steps involved in delivering a software change. Some key data points to look at within a value stream include:</p>

<p><strong>Cycle times</strong> across different stages and subsets of the overall value stream. Where is the most time spent?</p>

<p><strong>Waiting times</strong> between steps in the value stream. This leads to investigating the causes of waiting time, which can be a large proportion of cycle times and the overall lead time, and are often related to infrastructure.</p>

<p><strong>Handovers</strong> between teams across stages, another common bottleneck. A common handover is when a separate person or team handles a task such as preparing a test environment.</p>

<p><strong>Failure demand</strong>, where the value stream loops and extra steps are needed to investigate and correct problems.</p>

<p><strong>Capacity</strong>, where there are limits to how many changes can be in progress at a time. These constraints may be because of expertise needed, or system limitations like sharing a test environment across teams.</p>

<p><strong>Effort and expertise</strong> needed to execute a step in the value stream. Limits on these can affect capacity and waiting time for handovers. Constraints on expert effort are made worse by failure demand, which soaks up expert time to investigate and fix.</p>

<p>It’s worth noting that, in addition to software delivery, infrastructure can play a role in other value streams including production support and troubleshooting. Additional value streams can be mapped for activities for ensuring operational quality and managing risk, such as patching and upgrading services. Those infrastructure value streams can in turn impact software delivery value streams, for example by taking environments offline for maintenance.</p>

<p>The results of value stream mapping are useful for identifying where to focus efforts to improve the results of the end-to-end delivery process. It’s too common to automate an obvious part of the process that doesn’t make much difference to the big picture. For example, I’ve seen teams proud of having replaced a four-hour manual provisioning process with automation that takes ten minutes, only to discover that the two-week delivery cycle (including design, review, and approval stages) meant their users didn’t notice.</p>

<p>On the other hand, it’s important to understand that the impact of automating a step in the value stream isn’t only about completing that task more quickly. It could have added benefits like removing handovers and cutting waiting times.</p>

<h1 id="analyzing-infrastructure-capabilities">Analyzing infrastructure capabilities</h1>

<p>Platform and infrastructure improvement initiatives can improve value streams by optimizing underpinning infrastructure capabilities. Some of the software delivery value streams listed earlier involve making changes to infrastructure, like adding a new vector database or opening network connections to an external service’s API. But even with a basic value stream for delivering a software code change that doesn’t need any changes to infrastructure, infrastructure capabilities are required to get the change from the developer’s workspace to production.</p>

<p>For example, releasing a software change normally involves deploying a build with the change to one or more testing environments. The capability of “providing an environment for testing” can impact the effectiveness of the software delivery value stream in various ways, including:</p>

<p><strong>Wait time for an environment to be available.</strong> This happens when there are a limited number of environments for a particular type of testing.</p>

<p><strong>Wait time for an environment to be prepared.</strong> Configuration and data may need to be reset to a clean starting state after an environment was used for testing another build, especially if the previous build failed.</p>

<p><strong>A failure when deploying or running tests</strong> due to an issue with the environment’s configuration or data. This leads to time spent resolving the issue and re-running steps in the value stream. Often resolving failures means asking people from other teams for help.</p>

<p>Issues with wait time and failure can impact the effectiveness of the end-to-end value stream, leading to longer lead times and fewer releases. If problems with providing or preparing environments occur when deploying the software change to production, then deployment failure rate and recovery time metrics will suffer.</p>

<p>There are a number of metrics that can be useful for measuring the capability of providing an environment for testing:</p>

<ul>
  <li>How long does it take before the environment is ready to use? What is the turnaround time between deployments? What is the wait time for deployment?</li>
  <li>What are the capacity limits - how many of a given type of environment can be made available at a time? What is the demand?</li>
  <li>How much effort does it take to prepare an environment for deployment, and clean up afterwards? What skills are required? For example, it could need an infrastructure engineer, a DBA, and a QA to prepare an environment and its data.</li>
  <li>What is the frequency of failures in an environment, whether it’s a deployment failure or a test failure caused by an issue with the infrastructure or data?</li>
  <li>What is the utilization of infrastructure used for testing? Environments that are often idle between occasional deployments rack up wasted cloud spend. Even in a data center it’s a waste to have too much unused hardware capacity.</li>
</ul>

<h1 id="improving-an-infrastructure-capabilitys-effectiveness">Improving an infrastructure capability’s effectiveness</h1>

<p>An infrastructure capability can be implemented and delivered in different ways, each with varying impacts on the end to end value stream. Once you’ve pinpointed how a capability is creating bottlenecks, you should consider changing the way the capability is provided.</p>

<h2 id="how-is-the-capability-implemented">How is the capability implemented?</h2>

<p><strong>Static environments with manual preparation</strong> represent the least effective option, although the most common. A centralized infrastructure team manually configures and resets environments when needed. This approach introduces handovers between teams, which create waiting times before the deployment step can begin, and limits capacity to the number of pre-built environments. As we saw in the value stream analysis, these bottlenecks can dramatically impact end-to-end delivery effectiveness.</p>

<p><strong>Static environments with automated preparation</strong> improves on this by removing some manual effort and reducing preparation time. Define the starting state, including data, dependencies, integration points, configuration settings, etc., and have a way to automatically reset the environment to that state. This reduces waiting times for environment preparation and can eliminate some handovers. However, teams may still wait for an environment to be available. The mitigation to make sure there is enough capacity to prevent teams from waiting is to keep enough environments provisioned to meet peak demand, which can lead to low utilization (i.e. wasted hardware or cloud spend). Also, automated environment preparation needs to be highly reliable. Failures not only lead to delays, but also the need to ask someone else to help, adding to the effort absorbed. Destroying and rebuilding the environment each time tends to be a more reliable approach than cleaning up a running environment, but also slower.</p>

<p><strong>Ephemeral environments</strong> - build and destroy on demand - is the most effective approach. This builds on automated preparation but goes further by creating environments only when needed and destroying them when done. You get better utilization and cost management if you destroy or strip down environments when they aren’t in use. If you have the ability to add capacity on demand, whether by using cloud or a shared pool of data center resources, you can eliminate waiting times for environment availability. This approach can dramatically reduce cycle times and remove capacity constraints from the value stream.</p>

<h2 id="service-delivery-models">Service delivery models</h2>

<p>Beyond the technical implementation, the effectiveness of an infrastructure capability also depends on who provides it and how. Typical approaches are:</p>

<p><strong>Centralized specialist teams</strong> handle all requests for the capability. This creates handovers and waiting times in the value stream, as software teams must request environments and wait for them to be provided.</p>

<p><strong>Embedded specialists</strong> within software delivery teams (such as DevOps engineers) can reduce handovers and waiting times, though they may become bottlenecks themselves when multiple developers in a team need help. DevOps skills are also in high demand, so it’s often not feasible to hire people in every team for this.</p>

<p><strong>Self-service capabilities</strong> that allow anyone authorized in the software team to provision what they need (for example, using a developer portal) can eliminate handovers entirely and remove waiting times from the value stream.</p>

<p><strong>Fully automated provision</strong>, for example, integrated into deployment pipelines, represents the ultimate evolution, where environments are created and destroyed as part of the automated software delivery process without human intervention.</p>

<p>Each progression toward greater automation and self-service reduces the bottlenecks, handovers, and waiting times that impact software delivery value stream effectiveness.</p>

<h1 id="implementing-better-infrastructure-delivery">Implementing better infrastructure delivery</h1>

<p>The value of infrastructure management is in the outcomes of the work done by the people and teams who use the infrastructure. I recommend that engineering leaders who want to make infrastructure delivery more useful to the organization start by tracing the connections from business goals and value chains down to infrastructure management capabilities. This article has outlined an approach to do this to identify ways to use infrastructure automation to improve business outcomes.</p>

<p>I’ve used software delivery to illustrate this approach because it’s one of the easier parts of the value chain to trace the connection from business goals to infrastructure delivery.. Software is developed with specific business outcomes in mind, so it should be easy to understand the value in automating the process of providing test environments to make the end-to-end software delivery process faster and more reliable.</p>

<p>With other infrastructure capabilities, like ensuring systems are continuously patched and upgraded, and adding automated backups and failovers, the connections can be less obvious. But it’s entirely possible to make those connections using this approach, and it’s essential for avoiding accumulating invisible but potentially existential risk and inefficiency.</p>

<p>Understanding the value of improving infrastructure delivery is only the first step in making it happen. As I’ve suggested, business outcomes can often be improved by changing the way infrastructure capabilities are implemented. Automating tasks to provision and change infrastructure will only make the end-to-end processes the infrastructure supports better if they’re designed and implemented with an understanding of that process in mind.</p>

<p>I plan to expand on how to do that in later articles in this series. As a preview, there are three key elements of implementing infrastructure delivery to support effective value streams for infrastructure users. One is team structures and workflows. I’ll draw Team Topologies as well as continuing with value stream mapping to describe some different ways of providing infrastructure services. The second element is architecture, which is essentially about implementing infrastructure as composable components at two different levels, one for deployment (stacks) and one for consumption (compositions). My thinking here is heavily influenced by microservices, ports and adapters, and product thinking. The third element is delivery, which is the nuts and bolts of packaging, delivering, deploying, configuring, and integrating infrastructure components.</p>

<p>Although it sounds like I could describe each of these elements in a separate article, I think they’re too tightly connected to explain separately. My next article will probably be a fairly high-level description of these three elements and how they work together. Stay tuned!</p>

<p><em>Image by <a href="https://unsplash.com/@deleece?utm_medium=referral&amp;utm_source=unsplash">Deleece Cook</a></em></p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[An approach for identifying what value organizations get from digital infrastructure and how to trace that to the specific infrastructure delivery capabilities.]]></summary></entry><entry><title type="html">Will moving beyond Infrastructure as Code improve software delivery effectiveness?</title><link href="https://infrastructure-as-code.com/posts/infra-effectiveness-part1-new-tools.html" rel="alternate" type="text/html" title="Will moving beyond Infrastructure as Code improve software delivery effectiveness?" /><published>2025-05-28T09:20:20+01:00</published><updated>2025-05-28T09:20:20+01:00</updated><id>https://infrastructure-as-code.com/posts/infra-effectiveness-part1-new-tools</id><content type="html" xml:base="https://infrastructure-as-code.com/posts/infra-effectiveness-part1-new-tools.html"><![CDATA[<p>Many engineering leaders I talk to are frustrated with their infrastructure automation. Adopting the cloud and Infrastructure as Code was supposed  to speed up software delivery by providing infrastructure on tap. Instead, development teams seem to be constantly blocked waiting for environments to be built, updated, modified, extended, and fixed. There never seem to be enough environments, and yet cloud costs spiral. Messy and fragile environments are a bottleneck for developing and releasing changes. Infrastructure and platform teams are overstretched while technical debt piles up.</p>

<p>When my colleagues and I carry out software delivery effectiveness assessments, we often find time and effort wasted providing teams with the infrastructure they need. Value stream mapping shows teams losing time waiting for infrastructure work to be implemented, going back and forth to clarify needs and iterate as needs change during development, and troubleshooting and improving systems once they’ve gone live.</p>

<p>This is the first of several posts I’m working on to explore the question of how to improve software delivery effectiveness by empowering software delivery teams to provide infrastructure for themselves.</p>

<p>Infrastructure as Code is the dominant paradigm for automating infrastructure management today. But it creates plenty of pain points, especially for software teams that would like to manage their own infrastructure. A number of potential solutions that have emerged over the past few years, and new ones emerging now, focus on removing these pain points by improving the user experience of working with dynamic infrastructure.</p>

<p>Can software delivery be made more effective by giving software developers a friendlier, more effective interface for defining and provisioning infrastructure for themselves? Three potential approaches are offering more developer-friendly languages for writing infrastructure code, giving them GenAI chatbots or agents to provision code for them, or replacing infrastructure code with dynamic and extensible models or graphs.</p>

<h1 id="developer-friendly-languages">Developer friendly languages</h1>

<p>Traditional Infrastructure as Code tools provide special-purpose declarative DSLs (Domain Specific Languages), like Terraform, Ansible, and CloudFormation as an interface for defining the infrastructure you want. The tool generates a model of desired state, compares it with the reality, and then executes the changes to make the reality match the intention.</p>

<p>Many software developers I know find building infrastructure with these tools painful. They believe that alternative tools like Pulumi and AWS CDK that let them use general-purpose, imperative languages will remove the barriers for developers to manage infrastructure without relying on a separate team of infrastructure specialists.</p>

<p>Dynamic programming languages are more powerful than declarative languages for some areas of infrastructure automation, especially building component libraries. But they don’t really make low-level infrastructure coding more accessible to software developers. If you don’t have a solid understanding of networking, it isn’t any easier to wire together subnets, routing tables, firewall rules, gateways, and load balancers into safe and useful application connectivity in Python than it is with Terraform’s HCL. And Typescript doesn’t save you from needing to understand the tradeoffs of the many configurations of S3 bucket configuration for whichever one of the dozens of different purposes you might be using one for.</p>

<p>Some infrastructure experts are more comfortable using a general purpose language. And imperative languages are more appropriate than declarative languages for building component libraries. But they don’t reduce the effort or expertise needed to manage infrastructure for software delivery teams, so they don’t resolve the problem of making an organization’s software delivery process more effective.</p>

<h1 id="infrastructure-from-application-code">Infrastructure from (application) Code</h1>

<p>Some tools and platforms take the idea of developers writing infrastructure code a step further by making it possible to embed infrastructure code with application code. Platforms like Winglang and Darklang introduce new languages for coding applications and infrastructure together. Other solutions, like Ampt, Nitric, and Shuttle, introduce SDKs or annotation support for coding or defining infrastructure required in existing software languages like Golang and Rust.</p>

<p>The idea behind this approach is that a developer can declare the infrastructure required at the point where it is used. The code that reads and writes entries in a database specifies how to provision and configure the database. Code that writes messages to a queue creates the queue. An application declares the details for handling inbound network requests along with the business logic for handling those requests.</p>

<p>Infrastructure from Code does more than empower developers, it also ensures smooth alignment between building infrastructure and using it. Deployments often fail because of changes to one side or the other of infrastructure and application. Many of these failures can be avoided if the compiler or IDE immediately catches mismatches, like writes to an S3 bucket that hasn’t been created.</p>

<p>A major limitation of Infrastructure from Code is that it doesn’t address the concerns of infrastructure that isn’t directly used by a single application. Shared compute and networking resources; and platform services like monitoring and identity management need to be defined and managed separately. Environment management is also either hand-waved or tightly bound into application code. Enabling developers to manage application infrastructure is powerful, but most non-trivial systems have a broader scope.</p>

<h1 id="infrastructure-as-model">Infrastructure as Model</h1>

<p>There is an emerging crop of post-code infrastructure automation systems being developed by various startups like System Initiative and ConfigHub. These aim to remove the fiddliness of managing code in repositories. They also close the gap between live infrastructure and the representation that engineers use to define the changes they would like to make.</p>

<p>Infrastructure as Code uses code, whether declarative or imperative, to generate a model of the desired state, compares it with a model of the current state, and then makes changes to the existing infrastructure to converge it with the desired state. There are four states - code, desired state, model of current state, and actual state - can get out of sync in various ways that are at best confusing, and at worst lead to a broken state that is painful to correct.</p>

<p>Infrastructure as Model tools center on the model of current and desired state. They provide interfaces for defining the desired state. Demos focus on drag-and-drop GUI interfaces that demonstrate how much easier it is to compare desired and current state than is possible with traditional infrastructure code. But the real power with these tools is the programmable extensibility of the graph of desired and live state, and the events involved in building and converging them.</p>

<p>I confess that I don’t know exactly how Infrastructure as Model will end up being used in practice, but the possibilities are exciting. The implementations I’ve seen so far, as with using general purpose languages for Infrastructure as Code, tend to focus on improving the experience for infrastructure experts. I believe this is because the tools are still in early stages of development and are building the foundational functionality of assembling low-level cloud infrastructure into useful solutions. As Infrastructure as Model implementations evolve, they will hopefully address the higher-level concerns needed to make software delivery processes more effective.</p>

<h1 id="genai-as-infrastructure-assistant">GenAI as infrastructure assistant</h1>

<p>Most of the approaches I’ve discussed so far give users a more convenient interface for working with low-level infrastructure resources, but they still require expertise and effort to select, configure, and deploy those resources in the most useful way. LLMs have the potential to support developers by providing this expertise.</p>

<p>Developers can use coding assistants like Github Copilot, Amazon CodeWhisperer, Cursor, or Pulumi AI to accelerate their use of old-school Infrastructure as Code. Infrastructure as Model platforms are already incorporating LLM support, and various tools like Firefly Copilot can use GenAI as a natural-language interface for building and managing infrastructure.</p>

<p>Imagine a developer describing their application’s networking requirements so an AI agent can provision and configure the necessary infrastructure resources. They can tell GenAI what they need to use an S3 bucket for, and it selects configuration options that seem appropriate.</p>

<p>If done poorly, tools like this will be AI-assisted ClickOps, leaving organizations with an unmaintainable mess of snowflake infrastructure. There are several challenges to getting reliable, useful results from AI-assisted infrastructure management.</p>

<p>First, the AI needs to understand what the user needs, which means the user needs to be able to explain what they need accurately, clearly, and comprehensively. GenAI will guess to fill in any gaps or ambiguity. As anyone who has worked with GenAI very much knows, it’s important to understand the domain very well, and to know how to craft and iterate on prompts, to avoid making a mess.</p>

<p>A GenAI-based infrastructure management solution could provide software teams with well-designed prompts and guardrails could help them to manage their own infrastructure. The work to provide those tools then looks very similar to providing infrastructure automation with a code-based toolchain, including being done by people with deep expertise in infrastructure. Call it “infrastructure as prompts”.</p>

<p>There are interesting possibilities for using LLMs for managing infrastructure. However, using them to improve the effectiveness of software delivery, as with the other approaches I’ve discussed in this article, requires going beyond optimizing the low-level work of defining infrastructure, whether using code or another interface.</p>

<h1 id="valuable-progress-but-not-addressing-the-bigger-issues">Valuable progress, but not addressing the bigger issues</h1>

<p>The solutions here, including developer-friendly infrastructure coding languages, interfaces that work directly with infrastructure state models, or adding LLM-based AI assistance, all target the low-level tasks of defining and deploying infrastructure resources. They can make those tasks easier for infrastructure experts or for developers.</p>

<p>But none of them are convincing as a solution that will lead to engineering leaders declaring that managing environments and platforms on the cloud is no longer a bottleneck for software delivery.</p>

<p>Yes, Infrastructure as Code is an awkward mechanism for assembling low-level resources into meaningfully useful services and environments for delivering and running software. But replacing code with a more convenient interface for infrastructure experts to use doesn’t address the largest friction points. The biggest gap is between low-level infrastructure details and the solutions needed by software teams.</p>

<p>I’m working on followup articles to explore that gap and how to address it. The <a href="/posts/infra-effectiveness-part2-value.html">next article</a> describes an approach for identifying what value organizations get from digital infrastructure and how to trace that to the specific infrastructure delivery capabilities.</p>

<p><em>Image by <a href="https://unsplash.com/@brett_jordan?utm_medium=referral&amp;utm_source=unsplash">Brett Jordan</a></em></p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[Can software delivery be made more effective by moving beyond current IaC tools to give software developers a friendlier, more effective interface for defining and provisioning infrastructure for themselves?]]></summary></entry><entry><title type="html">Some Things I Learned From The Infrastructure Effectiveness Survey</title><link href="https://infrastructure-as-code.com/posts/survey-learning.html" rel="alternate" type="text/html" title="Some Things I Learned From The Infrastructure Effectiveness Survey" /><published>2025-03-13T06:20:20+00:00</published><updated>2025-03-13T06:20:20+00:00</updated><id>https://infrastructure-as-code.com/posts/survey-learning</id><content type="html" xml:base="https://infrastructure-as-code.com/posts/survey-learning.html"><![CDATA[<p>In January I set up an “Infrastructure Effectiveness Survey” and asked people to fill it in, promising that I’d share what I learned from it later on. Later on has arrived!</p>

<p>This survey was a starting point for me. I want to explore ways to measure the effectiveness of the practices and patterns I describe in the Infrastructure as Code book. I’ve added some material on understanding the business value of infrastructure management capabilities to the third edition. These feel like a reasonable foundation to build on, to be able to connect approaches to managing infrastructure with effective business outcomes.</p>

<p>This survey was never going to do that, since I don’t have the skills and time to invest in a comprehensive, rigorous survey. This is no <a href="https://dora.dev/">DORA</a> report. It’s more of a finger in the air, almost a way of collaboratively brainstorming with a few willing participants.</p>

<p>About thirty people completed the survey, which wouldn’t be much if I was hoping to get conclusive answers about what practices work best, but was plenty to get me thinking.</p>

<p>The first insight I came away with is about framing “infrastructure effectiveness” around capabilities infrastructure provides for the business. There are four high-level capabilities that I tried to explore in the survey:</p>

<ul>
  <li>Adding production infrastructure to onboard customers</li>
  <li>Adding production hosting in new regions</li>
  <li>Delivering infrastructure changes to existing infrastructure</li>
  <li>Supporting the delivery of software changes</li>
</ul>

<p>I asked several specific questions for some of these capabilities. One asked about the current effectiveness of the respondent’s organization at that capability. How long does it take to do the thing? Another asked about their need for that capability. How often do you need to do it? A third asked how much impact it would make to improve the effectiveness.</p>

<p>I didn’t ask those questions for all of the capabilities, so one of my takeaways is that I need to think a bit harder about how to express those questions, especially in the software change delivery area.</p>

<p>In this post I’ll focus on the first capability. I’ll write more later on other insights and inspirations I’ve taken away from the survey.</p>

<h2 id="adding-infrastructure-to-onboard-customers-is-fairly-common">Adding infrastructure to onboard customers is fairly common</h2>

<p>This capability mainly applies to services that are at least partly single-tenancy. I would expect that most consumer-facing services will be designed for multi-tenancy, because otherwise the time and expense of adding new infrastructure will be a massive drag on growth. I know a few software companies that moved to SaaS models have struggled with this.</p>

<p>But 86% of my respondents who answered this question do need to add new infrastructure to onboard new customers. 28 respondents is not many, so this is as likely to reflect the kinds of people who follow me on my mailing list or social media, and who were interested enough to complete my survey for whatever reason. Many (maybe most) of my clients are in this situation.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">How Often</th>
      <th style="text-align: right">How Many</th>
      <th style="text-align: right">Percentage</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">Multiple times a week</td>
      <td style="text-align: right">5</td>
      <td style="text-align: right">18%</td>
    </tr>
    <tr>
      <td style="text-align: left">Multiple times a month</td>
      <td style="text-align: right">6</td>
      <td style="text-align: right">21%</td>
    </tr>
    <tr>
      <td style="text-align: left">Multiple times a year</td>
      <td style="text-align: right">10</td>
      <td style="text-align: right">36%</td>
    </tr>
    <tr>
      <td style="text-align: left">Less than once a year</td>
      <td style="text-align: right">3</td>
      <td style="text-align: right">11%</td>
    </tr>
    <tr>
      <td style="text-align: left">None</td>
      <td style="text-align: right">4</td>
      <td style="text-align: right">14%</td>
    </tr>
  </tbody>
</table>

<p>There’s a roughly even split between those who need to do this at least once a month and those for whom it’s less frequent. I’d guess there’s a correlation to deal size: some companies have a few, large clients, while others have a larger number of smaller clients.</p>

<h2 id="differences-between-frequent-and-infrequent-deployers-of-new-production-infrastructure">Differences between frequent and infrequent deployers of new production infrastructure</h2>

<p>I would expect the group that needs to deploy new infrastructure more often to be faster at doing it. But the results on that question had a weak positive correlation (0.27), which suggests that while it’s generally true, there’s a gap. This is backed up by their answers to the question about how much of a difference it would make to the organization to reduce the time and effort for deploying new infrastructure. Most of those that need to do it more often feel a need to improve. That might seem obvious, but if they were good enough at it already, they probably wouldn’t feel as much need to work on it.</p>

<p>There’s a stronger correlation with those organizations that do deploy customer infrastructure more often and effectiveness at the other capabilities, like adding new regions, updating existing infrastructure, and delivering software changes. That makes sense - they have more incentive to get their infrastructure automation working well.</p>

<p>Another finding is that those teams that deploy customer infrastructure more often will be stricter about not forking or copying infrastructure code for each customer. The results (again, for my small sample) show a .8 correlation between those who deploy at least once a month and using configurable infrastructure code. There is a .38 correlation between those who deploy less often and maintaining separate copies of infrastructure code for each environment.</p>

<p>One final observation is that there doesn’t seem to ant noticeable correlation between tech and tool choices and deployment frequency.</p>

<h2 id="next">Next</h2>

<p>I’ll have a poke into what I learned about software delivery support effectiveness. As I mentioned, this is an area that I need to think harder about for framing and assessing in the future.</p>]]></content><author><name>Kief Morris</name></author><summary type="html"><![CDATA[In January I set up an “Infrastructure Effectiveness Survey” and asked people to fill it in, promising that I’d share what I learned from it later on. Later on has arrived!]]></summary></entry><entry><title type="html">The Quality x Speed Quadrants</title><link href="https://infrastructure-as-code.com/posts/the-quality-speed-quadrants.html" rel="alternate" type="text/html" title="The Quality x Speed Quadrants" /><published>2025-02-14T08:20:20+00:00</published><updated>2025-02-14T08:20:20+00:00</updated><id>https://infrastructure-as-code.com/posts/the-quality-speed-quadrants</id><content type="html" xml:base="https://infrastructure-as-code.com/posts/the-quality-speed-quadrants.html"><![CDATA[<p>The idea that you can go faster by sacrificing quality, or improve quality by moving more slowly, is a false dichotomy.</p>

<p><img src="/images/quality-speed-false-tradeoff.png" alt="Quality and speed at opposite end of a two-dimensional line" /></p>

<p>The reality is that speed and quality are mutually reinforcing. The ability to make changes quickly is essential to creating and maintaining a high level of quality. And a high level of quality is the only way to sustain the ability to make changes quickly.</p>

<p>Rather than a two-dimensional line, the best way to show the relationship between quality and speed in a system is with a matrix of quadrants.</p>

<p><img src="/images/quality-speed-quadrant.png" alt="A matrix showing prioritizing speed on one axis and prioritizing quality on another. The upper-right quadrant shows prioritizing both speed and quality." /></p>

<p>The upper-left quadrant is where teams try to prioritize quality by sacrificing speed. They use heavyweight change management processes, believing that involving more people in making a change, with more rigorous activities to inspect, justify, and sign off on each change, will “raise the bar”. What happens in practice is that having so much friction for making a change discourages people from making smaller fixes and improvements. The list of “known issues” grows, technical debt builds up, and people become accustomed to the system’s bugginess and awkward usability.</p>

<p>So sacrificing speed to achieve quality achieves neither.</p>

<p>In the lower-right corner, teams try to “move fast” (and maybe “break things”). The most important thing is getting features into users’ hands, testing is seen as a luxury. Perfection is the enemy of building a business before the money runs out. The problem with this approach is that shoddy code is hard to change. The team reaches a point where it’s hard to make a simple change because the codebase is a tangled mess far more quickly than they had assumed.</p>

<p>So sacrificing quality to achieve speed achieves neither. This conclusion is backed by research (see <a href="https://itrevolution.com/product/accelerate/">Accelerate</a> by Dr. Forsgren et al, plus multiple <a href="https://dora.dev/">DORA reports</a>).</p>

<p>In practice, sacrificing either quality or speed leads to a fragile, messy codebase that is difficult to change, which is the lower-left quadrant. Nobody aims for the lower-left quadrant, but the vast majority of teams end up there.</p>

<p>The upper-right quadrant is where teams prioritize both speed and quality. This was the idea behind agile software development, at least before agile became gentrified by those who believe in process over people.</p>

<p>The upper-right quadrant is not about finding a balance between speed and quality. It’s about using speed to achieve quality, and quality to achieve speed.</p>

<p><img src="/images/quality-speed-reinforcement.png" alt="One arrow leading from quality to speed, another leading from speed to quality" /></p>

<p>It’s about “<a href="https://tidyfirst.substack.com/p/bugs-optional">Zero Defects</a>” [Kent Beck] (which is not about never having any bugs, but making sure you fix them immediately when you find them). It’s about building the <a href="https://dannorth.net/best-simple-system-for-now/">Best Simple System For Now</a> [Daniel Terhorst-North].</p>

<p>A key to breaking the “tradeoff” mindset is in how you define quality. Quality is not features. It’s not extensibility. Quality is doing what needs to be done now, and doing it correctly. Quality code is easy to change. A good process drives quality by minimizing the friction of making a change safely.</p>

<p>Simple design and simple implementation make quality easy, because it means the code is easy to understand, change, test, and fix. Making small, frequent, incremental changes in a continuous flow reduces the friction that comes with heavier process (branching and merging). Pairing keeps people focused on clean, simple, and correct implementation. Really, we could do worse than to refresh ourselves on old school <a href="http://www.extremeprogramming.org/rules.html">Extreme Programming</a>. Daniel Terhorst-North’s <a href="https://dannorth.net/cupid-for-joyful-coding/">CUPID</a> approach for “joyful coding” is all about making sure your code is easy to work with.</p>

<p>Yeah, this is a site about Infrastructure as Code. But as with so much, the tenets of good software apply across the full system stack.</p>]]></content><author><name>Kief Morris</name></author><category term="concepts" /><summary type="html"><![CDATA[The idea that you can trade speed and quality off of one another is false. The quality x speed quadrant illustrates that the goal is to use quality and speed to reinforce one another.]]></summary></entry></feed>