A group of my colleagues and I are reading The Phoenix Project, a novel about a struggling IT organization attempting to bring together development, operations, and security. Reading it has been a fun way of remembering some DevOps techniques. Since I’ve been enjoying the book so much, I decided to pick up a copy of The DevOps Handbook. Written by some of the leaders in the DevOps and Continuous Delivery movements, it is an excellent guide to implementing DevOps techniques and improving how work gets done in a technology organization. I’ve already seen ways to apply these lessons to my team at work.

The authors start off the book with an introduction to their “Three Ways” before describing how to start integrating development and operations. The bulk of the book covers each of the Ways in more depth. These are the core principles behind developing products customers want reliably, quickly, and securely.

The First Way: Flow

The First Way involves improving flow through an organization. As a technology organization, we want to increase throughput in a system so that new ideas and features are brought to market swiftly, rather than lingering for months in a backlog of user stories.

To improve flow, we first need to understand the work flowing through our system. The time an item spends waiting in a queue is a function of the extent to which a required resource is occupied.

Wait time grows as a function of resource business

This is a crucial insight. If a person or a team is continually occupied near 100 percent of capacity, new work piles up and moves through the system very slowly. When a resource has some slack, however, wait times fall dramatically. To improve flow, the authors suggest limiting the amount of work in progress you have, keeping batch sizes small, and reducing handoffs between teams.

Improving flow does not work, however, simply by making random parts of your organization more efficient. You must identify the key constraint, the limiting variable, where work piles up. This point is where you must improve efficiency to increase flow. Improving the efficiency of another part of the organization will simply allow work to pile up at the limiting variable even faster.

Some of the other guidelines in this section include:

  • Enabling developers to create deployment pipelines for a variety of environments from dev all the way through production
  • Implementing thorough test suites
  • Practicing Continuous Integration
  • Automating deployments and making releases as low-risk as possible

The Second Way: Feedback

Improved flow is good, but a technology organization’s work requires feedback. To get feedback, the authors suggest creating telemetry, or metrics, that allow teams to pinpoint and solve problems. Organizations should collect a variety of data types—from low-level environment data to high-level business logic data. Different types of data will be useful for different people and will point to different problems.

As with testing, security, and broader operations work, telemetry must be part of teams’ regular, daily work. When telemetry is widely implemented, it can not only help resolve issues, but also serve teams looking to A/B test new features.

The Third Way: Learning and Experimentation

With good instrumentation in place, organizations can use the feedback they get to create a learning culture that looks not to blame people when things inevitably go wrong, but rather encourages enlightened risk-taking. The authors describe how to conduct a “blameless post-mortem” when incidents occur and how to regularly practice resolving production failures. They describe Netflix’s famous Simian Army, and point out how as organizations improve, they can fine-tune their failure detectors to look for weaker failure signals, making their applications even more resilient.

The final chapters in this section provide some actionable tactics for sharing knowledge within and organization and performing regular maintenance to avoid the accretion of technical debt.

The book concludes with descriptions of security and change control practices that echo the lessons from earlier in the book. Namely, security should be part of everyone’s daily work, not some isolated security team that swoops in after the fact. Organizations can make change control simpler by making smaller, lower-risk changes that don’t need to go through a cumbersome change control process. Bringing the deployment and change control process closer to the developers who created the changes gives them more skin in the game and reduces the risks when an organization rigidly separates duties.

The DevOps Handbook is an excellent book, and I am confident I will use it many times in my work. Even if you’re not a developer, there are many lessons here for how to lead an organization to do better, more reliable work that responds to customer needs.