Trouble with Troubleshooting

I spent decades in technology troubleshooting activities. This included time as a software engineer, an integration engineering leader, a consulting middleware and cloud architect, and as a critical situation leader.

Some times in my career troubleshooting was an ongoing, continuous activity. In part, what I found is the 5 Why’s of root cause analysis is seldom appropriately scoped. What I mean is the major causes and contributors to major technology incidents and problems were seldom based on a) the proper supposition and b) included all causation and contributing factors.

The proper presupposition is all technologies are inherently flawed. I remember working with a particularly challenging client Director. In a one one one conversation I explained that software is inherently flawed and the task was a continuous discovery of the next flaw, the next and the next and so on. This led to a pivotal shift in the relationship. One of the thing I was tasked to do was write synopsis communications. And this Director remarked to me “I don’t have to change anything on your reports, I can pass them on as they are to my manager and my manager’s manager as they are”.

That pivotal shift in the relationship led to improved communications and collaborations between the client team and the services team. And, that eventually led to additional special projects that led to cost saving improvement programs.

The main paradigm shift was every flaw became an opportunity. Every opportunity led to more opportunities and that eventually led to innovation. Improvement is good, innovation is the essential lifeblood of business in today’s highly competitive landscape.

If all technology is flawed and all technology implementations are flawed,

Technical flaws fall into these main categories:

  1. Product defect
  2. Architectural defect
  3. Design defect
  4. Development defect

The wrong supposition is that 1 through 4 are isolated and independent from impacting the other. In fact, flaws are introduced around pre-existing flaws. What this means is an architectural flaw is present because of inherent product flaws.

All hardware and software is flawed. Therefore, all architectural arrangements of flawed hardware and software is flawed. So, the task is continuous improvement.

Since the variables of business are continuously changing, they are continuously revealing new flaws.

Some few technical people know, understand and accept these truths. More managers do and good leaders come to know this and great leaders to embrace it.

What many may not understand is that troubleshooting itself is flawed.

The first and most important thing for companies to address are the flaws with troubleshooting.

Organizational flaws exasperate and prolong technical flaws. Organizational flaws fall into these main categories:

  1. Organizational Behavior – Culture – belief & behaviors
  2. Organizational structure

Organizational flaws exasperate these challenges:

  1. Talent acquisition and retention
  2. Technical flaws

Because there are inherent organizational flaws, poor implementation and architectural choices are made.

One of the widespread problems in the business sector is the flawed implementation of advanced technologies. Plainly, a fighter jet with advanced instrumentation can not be properly operated with a basic driver’s license. The more advanced and powerful a technology, the more advanced and complex the skills required to properly architect, design and implement the technology.

If this were not enough, there are flaws in the technology provider selection process. Technology selection choices are made socially, politically, strategically, financially and functionally. Choosing a technology functionally comes down to trading one flawed solution over another. Therefore, the selection should not be on the basis of present functionality, but rather, compatibility and commitment to a future with fewer flaws and greater functionality. This is a matter of true and proper enterprise architecture. This is where people, process, policy and procedures must be designed in a model and driven by troubleshooting flaws comprehensively. In this way ecological models are most appropriate.

A bit more on architectural flaws. Architectural decisions are inherently flawed when they base everything on past performance and not future potential.

Back to the 5 Why’s of root cause analysis. Most often symptoms were diagnosed and treated and called root causes. This was both intentional and unintentional. Some were aware of this and others unaware. In a system that is rewarded by adding resources, it is not incumbent upon that system’s stakeholders to identify true root causes. This is the fundamental flaw of many managed resources, outsourcing and resource driven contracts.

With cloud, automation, artificial intelligence, machine learning and robotics, the shift of major enterprise is from service driven economies of scale to technology driven products and platform economies of scale. This is the economics of product and platform. And it is precisely why the Product Owner role is so hot and in demand.

The shift from technology services does not mean the replacement of work forces as much as the shift to product design, development, sales and solutions architecture.

Another primary example of the need for human agency is the APM technology—Application Performance Monitoring conducts technology defect diagnostics. What this does not do is a deeper layer of flaws that led to the flaw. And it also does not conduct composite, comprehensive architectural flaws. These are still human driven activities.

Many of the technology service skills are not only transferable to other applications, scopes and industries, but uniquely valuable when combined with other skills and domains of knowledge. This is especially true since technology is embedded within and the driving force of every industry.