Ray Parker – Epsilogix

A Message of Thanksgiving…

Every now and then, a quiet moment will come upon us, and a thought of some treasure in our lives will surprisingly overtake us. For that moment, we cherish our treasure, remember to be grateful, and resolve never to take it for granted. Reality intrudes, and the thought of our treasure vanishes as quickly as it came.

Thanksgiving means many things, but generally it asks us to stop and reflect for more than a moment about treasures like those. With origins in giving thanks for the harvest that was needed to sustain us in the months to follow, the holiday has stretched to embrace the family and friends that bring so much joy into our lives.

As this holiday approaches, let us take the moments we need to be thankful for all of our treasures, and share with them just how much they mean to us. Let us also offer thankfulness for hidden treasures: the soldier serving their country, who won’t be seeing family next week (or possibly ever again); the parent who diligently shows up day after day to a grindstone job, to make a space in this world for their child; the first-responder who runs towards the danger everyone else is running from; the kindness that came our way unexpectedly, just when we really needed it; and thankfulness for the moment we realized we had the chance to be that kindness for someone else, and did it.

Treasures come in all shapes and sizes.

Happy Thanksgiving to all.

…Ray Parker

A Dashboard Worthy Of The Name…

Finally, proof that there are people out there that understand what a dashboard is supposed to deliver.

Some time ago, APMigest published Five Dashboard Must-Haves for BSM, by Steve Tack of Compuware. What he has to say are guiding concepts anyone delivering dashboards (or any reasonable facsimile thereof) should follow.

Pay particular attention to his comments about “role-relevant views”. So many of the top-level screens I see assume the viewer is a technician, which is almost certainly not the case. For example, an international webcast from a major vendor recently showed screens that were 95% completely technical in nature, and as a result were unintelligible to anyone else.

This misses the point entirely. If we are not going to deliver information a business manager can understand, why do we call it “Service Management” in the first place?

The technical information is useful, as it answers the “Why?” question, as in “Why is my widget factory out of commission?”

However, Service Management is not about the “Why?”, it is about the “What?” The goal is to take the information we have about monitored technology elements, and translate that into the current impact on the business, so that a business manager can understand and act to correct the problem.

Furthermore, this idea of “What?” is entirely relative, since it is based on whom you (the viewer) are. Since Service Management is not a solution we deliver to a systems administrator, why do we insist on demonstrating, building, and delivering dashboards whose top-level view requires those skills just to get started? (It gets worse from there.)

Service Management helps us understand using the “view-at-a-glance” top level that the widget factory is what is impacted, and the reason why is only visible as we drill down further: the supply chain order inventory server went down due to a process failure.

Notice the “What?” part of this is a “widget factory”. That is an entity the business manager understands on so many levels, and one that may have no meaning at all to a technician.

Dashboards should present initial views which make sense to the business management audience, then provide additional detail if the viewer wants or needs to know more. This is the correct way to accommodate the different needs of team members in different roles.

Next time, we’ll discuss the underlying reasons this information is relevant to the business manager. You already know the reasons, but you may not have thought to orient your dashboard to show you the information that makes it all possible.

At Epsilogix, we understand all this, and we know how to deliver it, because we’ve done it before. If you find yourself confused by what we (and Steve Tack) are talking about, or wonder about the difference between what you’ve been seeing everywhere and what we are talking about here, call us. The difference will open your eyes.

The Heart Of The Matter…

When it comes to dashboards and Service Management (or any of the many aliases used for that term), no one ever seems to discuss the concept that lies beneath the best implementations.

Yes, we are trying to provide a view of technology in a business-oriented way. That is obvious, even if it is beyond the scope of so many products that claim to do it. But what is it really that Service Management needs to do to deliver that promise?

Is it about the visual representation? (It could be, but there is much work to be done before you can even start to think about what the view should look like.)
Is it about the data?
Is it more important to have real-time monitoring data, or application performance data?

The famous “it depends” rears its ugly head here, but then again, it’s not so much about how much data you have as what you do with the data you have.

All solutions face the same issue: large volumes of data to show, but a very small span of time and space to show it in.

Clearly, we cannot show all the data we have. Adding to that, the process of deciding what is meaningful and what is not meaningful is not a linear process: you cannot simply leave out some bits while including others on the display. You are collecting that data for a reason, so discarding it at the point of delivery suggests cluelessness on an industrial scale, either because you collected data you did not need or because you discarded useful data.

So what are we to do with all this data? With a nod to the many variations that exist of the “DIKW” hierarchy, let me present my own simplified version.

The Modern Intelligence Hierarchy

We start with raw Data, evaluate that to achieve Information, repeat the process again at a higher level to get Knowledge, and then repeat the process ad infinitum until you arrive at Intelligence.

You can choose any word you like, but the word I prefer is “distill”.

The goal is to not discard meaningful data, but also to realize that no visualization technique can encapsulate all the raw data that has been collected. (Wall Street types: this applies even to your vaunted “heat map”.)

The key to this is simple: for each iteration, the distillation process uses all of the data you chose to feed into it from the lower level, summarizing the results in a meaningful manner. Keep in mind the actual number of iterations is purely arbitrary, so don’t get caught in the trap of presuming some fixed number of iterations is either right or wrong. The point is to distill large volumes of data into small bits of intelligence. In the business world, this would also be referred to as “actionable intelligence”, because someone is going to take an action based on what is revealed.

The good news is that “meaningful” is in the eye of the beholder, so the results of this distillation process are not required to fit any specific visualization, though there are better and worse ways of delivering it.

The distillation process uses all the data, summarizes it at every level, then allows you to determine, based on the audience, what is meaningful to display.

This is a perfectly fine theoretical discussion, but how do you put this concept into practice?

The phrase “event correlation” has been bandied about for years, but has only ever succeeded in one limited space: correlating network events to analyze and identify the root-cause of a network problem. Correlation succeeds here because the rules for the topology of a network are well defined, and change is easily handled by refactoring the network discovery data through the same rules again.

However, there are no such rules for the topology of an application or a service, so event correlation in those areas fails, not because it can’t work, but because it only works until something changes in the environment, at which time the rules have to be manually rewritten and/or re-validated. As we know, nothing in the Enterprise ever changes, right?

In the era of the cloud, the thinking is that application performance monitoring data is all that is necessary. Why bother with real-time monitoring of devices or containers when you already know the application is performing well?

In this cloud era, application performance data may be all that you can get: depending upon the contract you have with your cloud provider, you may not have access to any real-time device or container monitoring at all. It’s possible they don’t have it either, since many companies espouse the idea there is no point in collecting data they can’t sell. The trap is set however, since sooner or later your application performance data will inform you that the response time is in the unacceptable range, and due to the lack of real-time monitoring data (or lack of access to it), how will anyone be able to identify (much less correct) the problem?

There is a way to resolve all of this, because distillation of the raw data means you don’t have to discard or ignore useful data, yet you can still have present a meaningful visualization that works for each audience without overwhelming them, which is by definition what a dashboard should do, isn’t it?

In my next entry, I’ll go into detail about the key to distillation, and how it can be done very easily.