Workflow Management Tools, Tips & Software | Chekhub Blog

Does your Data Center Operations Live on the 'Edge of Chaos'?

Written by Sarah El-Taher | Feb 1, 2023 10:31:11 PM

While at the AFCOM NC holiday party last year, I met Jared Foster, a regional Data Center Manager for Tierpoint with over a decade of DC experience. After introductions we quickly found ourselves in a lively theoretical debate about data center operations and a concept called the ‘Edge of Chaos’

Edge of chaos stems from the Chaos Theory, but to avoid a deep dive into a scientific nerd hole, I’ll use Wikipedia’s brief description: a transition space between order and disorder that is hypothesized to exist within a wide variety of systems. For our example, the high-level overarching ‘system’ of data center operations. 

How did we get to this topic? Simple: digital checklists. Allow me to bridge the gap, as I think this is crucial to the hiring challenges in the industry today. Checklists represent a high degree of order, precision, and organizational process. However, not everything that happens in a data center can be reflected in a predictable process. Some operational tasks/emergencies require immediate troubleshooting and quick thinking. That gray area in between is what we’re calling the ‘edge of chaos’. 

Checklists vs. Experience 

To control potential chaos, you’ll find endless checklists in the military, aviation, and even the medical field with plenty of data that shows checklists reduce human error, which saves lives.  Checklists standardize, simplify, and control processes being conducted within your operations. They enable people that are less experienced to execute on a process that more experienced people would know by heart. Experienced teams may not require checklists to perform their work accurately, but checklists keep all teams consistent and within requirements. 

While Jared appreciates the rigidity, precision, and percent of human error that can be reduced using checklists, his debate was centered on the potential risks a high degree of order can introduce when routinely using checklists for every process.

But what’s wrong with order and precision in the case of data center operations? 

Edge of Chaos

Order and precision isn’t the problem, in Jared’s perspective of course.  It's what's potentially lost in the landscape of having a team of people dependent on checklists: the ability to troubleshoot in the face of a challenge, or emergency, and their ability to think on their feet. Instead, he ideally wants a team of people that can handle any unexpected challenges in real-time because they have years of experience under their belts. The best way to mitigate risk in a critical environment is to control what you can, but have the experience and know-how to manage what you cannot. In other words, experienced data center operators are more equipped for the ‘edge of chaos’ versus people that don’t know what to do outside of mandated step-by-step checklists.

The Trouble with “Ideally Experienced”

Who could argue with this? He’s right, any of us would hire the more experienced candidate that doesn’t require step-by-step instruction for the majority of their daily duties. The problem is at least 50% of data center owner/operators are struggling to fill positions right now.1 In addition, every owner/operator I’ve spoken with in the last six months is currently adding to their portfolios and in dire need of scaling their operations teams. There simply isn’t enough “ideally experienced” to go around. 

Conquering the Labor Shortage 

Widening the hiring pool is our only option and the industry will be forced to create/train/mold teams into what they need using who’s available vs. hiring the optimally experienced. 

Jared recognizes his ideal scenario may not be obtainable in this current market, or scalable for that matter given the intense growth track of the industry today. Many ops managers are coming to grips with this growing pain and looking for solutions to bridge their gaps in order to scale their teams with as little human error along the way as possible. 

There’s no doubt checklists reduce error in the field while allowing management control over the processes. However, training and onboarding is now more essential to conquering the labor shortage than it was in the recent past. A strong combination of training and checklists will be necessary to reduce the chaos. Especially if we need to bring a less experienced labor pool as close as possible to managing the ‘edge of chaos’ as Jared and others would prefer. 

Start Now

In order to bring more talent and experience to the industry and more importantly, directly to your team, there’s 2 general ways to get there:

  1. Outside support such as educational paths and resources like the Data Center Operations program at NOVA Community College, books such as Carrie Goetz’s Jumpstart your Career in Data Centers, and quickly developing efforts such as the Nomad Futurist Academy. 

  2. More immediate internal efforts to create/improve your own training and onboarding programs to scale the less experienced hires as fast as possible. Building and improving your own training programs and integrating them into your overall operations management system is the path many organizations are taking to solve their staffing challenge now. 

Strong strategy efforts will need to be made to build effective training programs and to digitize your MOPs, SOPs, rounds, etc. into digital checklists, but it’s a small price to pay upfront in order to fortify your operations against the pitfalls of a shrinking skilled labor pool, and the growing pains of this resilient industry. Integrating your operational workflows using tools that enable and empower your workforce will begin to bridge the gap between the know-how of your experienced engineers and the green new hires that need to learn how to ride the edge of chaos. 

1Ascierto R.(2021, January 1). The People Challenge: Global data center staffing forecast 2021-2025. Uptime Institute. https://uptimeinstitute.com/global-data-center-staffing-forecast-2021-2025