June 2022 - Andres Andreu, CISSP-ISSAP, QTE

Cybersecurity metrics, the challenge, measure what matters.

Warning: there are a number of somewhat abstract concepts presented here. I know some people have a hard time with abstraction so please read with an open mind. There are a lot of open ended questions as well. This article is intended to spark thought outside of the norm as it relates to cybersecurity metrics.

As an industry we ([cyber | information] security) have struggled to pin down the art, and science, of cybersecurity metrics. It is a struggle that some feel they have mastered. I am not entirely convinced. Moreover, I see the general consensus sadly playing in the safe zone when it comes to this subject. It takes courage to measure what matters as opposed to measuring what is possible, or easy. It also takes courage to measure elements that are somewhat in the abstract because of the difficulty at hand.

I acknowledge that “what matters” is subjective to four entities, the organization, its C-Suite, its varying board members and us (Cybersecurity leadership). We can steer the conversation once there is internal clarity in reference to the items that really matter.

One of the enemies we have to contend with, is our indoctrination to always strive for 100%. This score, level, grade, is simply unachievable in most environments. And what really constitutes 100%? Is it that our organization has been event-less by way of incidents, breaches and/or data exfiltration? What constitutes the opposite, or a score 0 (zero)? We have to stop thinking like this in order to get a realistic sense of metrics that matter.

My contention is that we need a small, tight, set of metrics that are representative of real world elements of importance. This comes with a fear alert, because in some cases measuring these areas will show results that come off as some type of failure. We need not feel like this is reflective of our work, we are merely reporting the facts to those who need them. “Those” would generally be the board and the C-Suite. They will probably have a hard time initially understanding some of these areas and admittedly they are very difficult to measure/quantify.

It is the job of an effective CISO to make sense of these difficult to understand areas and educate those folks. But, the education aspect is not just about understanding them, but to how extract value from them. This is where the courage comes in because a lot of people have a hard time accepting that which is different than what they are accustomed to.

Subjectivity is important here. There are few formulas in the world of cybersecurity and what matters to one organization may have little relevance elsewhere. Organizations need to tailor their goals, and in turn the measuring mechanisms, based on what matters to them. This of course has a direct impact on what risk areas come to light, which ones need to be addressed with urgency and those that can wait. Hitting these subjective goals (that should be defined by the metrics) could also bring about ancillary benefits. For instance this could force the issue of addressing technical debt or force a technology refresh.

Here are some suggestions (nowhere near exhaustive) that are top of mind in respect to metrics we tend not to pursue (mainly due to the difficulty of measuring them):

Effectiveness of active protection mechanisms – This one seems obvious at face value. Grab some statistics after the implementation of some solution, for instance a Web Application Firewall (WAF) and show how many awful things it has prevented. But this is such a fragmented perspective that it may provide a false sense of security. What about your machine to machine communications deeper in your network (VPC or otherwise)? How are you actively protecting those (API requests/responses, etc) entities?

I find the bigger challenge here is ecosystem wide coverage and how you show the relevant effectiveness. There are other difficult to measure areas that directly impact this one, such as attack surface management. But if we, as an industry, are ever going to get ahead of attackers, even in the slightest way, this is an area we all need to consider.

Reproducibility – The “X as a Service” reality is here and quite useful. “X” can be infrastructure, it can be software, it can be many things depending on the maturity and creativity of an organization.

From the software perspective, what percentage of your build process exists within a CI/CD pipeline, or process? This strongly sets a reproducibility perspective. Within a CI/CD process many areas, such as security, resilience and DR, can be covered in automated fashion. Vulnerability management, and patching, can be included here as well. It’s 2022 and if your organization hasn’t invested in this area you need to put some metrics together to make a case for this.

Attack Surface Management – What does your organization look like to an outsider? What does it look like to an insider? What does it look like when you factor in ephemeral entities (such as elastic cloud resources)? Does your attack surface data factor in all assets in your ecosystem? What about interdependencies? Asset inventories are seldom accurate and so possibly your attack surface is a snapshot in time as opposed to something holistic.

There is a lot to consider in terms of attack surface metrics yet it is such a key component to a healthy cybersecurity program. Please don’t think that any one specific product will cover you in this area, most are focused on external perspectives and miss the insider threat vector entirely.

Software Security – This is an enormous subject and one that deserves an entire write itself. The maturity of software can certainly be measured with techniques like SAMM (one such model is OWASP SAMM). Creating, and implementing, a SSDLC goes a long way in integrating security into the core software development process. Underlying any of these techniques is the need to map software to business processes. Otherwise you have a purely technical set of metrics that no one outside of tech will be able to digest.

Technical Debt – This area is complex as it can contextually refer to software that needs to be refactored or it can refer to legacy systems (stagnant or otherwise). Regardless of the context how does one measure the level, or severity, of technical debt within an organization? If a successful relevant model is created it will probably create a strong argument for some budget 🙂

Distance Vector – How far into your ecosystem can an attack get before it is detected and handled (whatever handling means to your organization)? The logic here is simple, the longer it takes to detect something the more you need to pay attention to that area. Think of APTs and how long some of them exist inside of your network before there is detection and response.

Time vector – Who is faster you, the defender, or the attackers? There is a reality to the time factor and every time your organization is targeted there is a bit of a race that takes place. Where you end up in the final results of that race dictate, to an extent, the success factor of an attack. This is very hard to measure. But, spending time figuring out ways to measure this will yield an understanding of the threats you face and how well you will fair up against them.

One great benefit of spending time assessing your time vector is that it will force you to measure your ability to successfully address entire families, or classes, of attacks. Having the macro focus, as opposed to the typical micro focus may bring about an interesting level of discipline with your technical teams. Basically, they will be forced to think big and not exclusively on edge, or corner, cases.

Repeatability – How repeatable are key functions within your organization? Measuring repeatability is super difficult and yet this is a foundational aspect of mature cybersecurity programs. Playbooks exist for this exact reason and we do invest resources into creating, and maintaining, them. This means repeatability is undeniably important but yet how do we quantify this?

Budgeting – How do we know if enough is being funneled into a security program? At the end of the day we can’t plug every hole. One strategy is to perform crown jewel assessments and focus on those resources. Another one is to analyze attack surface data and cover areas of importance. But how do we measure the effectiveness of these, and any other related, strategies?

Insufficient budget obviously reduces the ability of a security team to implement protective mechanisms. The metrics we focus on need to push for clarity in terms of what becomes possible. There’s most likely no correct amount of budget but we get a slice of some larger budget. What we get becomes a variable amount over some period of time. But the budget itself needs to be treated as a metric. Think of it this way, if you don’t get enough budget to cover the areas you know need attention then there will be gaps that are directly attributable.

Sadly a lot of budget increases come about because something bad has happened. But this (the fact that something bad happened) means more work needs to be done. And yet we struggle with the necessary quantification. Ultimately we should pursue business aligned initiatives irrespective of the difficulty of trying to pin down an accurate budget.

All-Out Mean time to recovery (MTTR) – Imagine the absolute nightmare scenario that your entire organization is decimated somehow. Imagine you are brought back to the stone ages of bare metal and have nothing but a few back-ups to recover from. How long will it take you to get your organization back to an operating business? Some organizations are well positioned to recover from isolated incidents, like a ransomware event. My thought process is around something far more catastrophic.

I am not sure there is an organization on the planet that can answer this question at breadth and depth. I fear that there is also a lot of hubris around this subject and some may feel this is not a situation they need to account for. The more typical all-out scenarios you may encounter focus on operational areas. For instance if all servers become unusable there is a DR plan that has been designed, tested, and tweaked to reach acceptable MTTR.

From a positive vantage point, the very act of trying to measure some of these admittedly challenging areas of operation will likely reveal many areas of improvement. That in and of itself may prove valuable to your organization in the long run. There are so many more we could come up with but the areas presented here are a decent starting point.

On the negative end there is an enormous challenge in that board and the C-Suite might not understand these metrics. Hell, I can think of many IT leaders that wont understand some of them. But these are not reasons to shy away from the challenge of educating folks. I understand the notion, and am a practitioner, of talking to the board on their terms, in their language. But is it truly beyond their capabilities to understand some of these points? Is it unreasonable for us to push for a deeper level of understanding and interaction from the board and the C-Suite on these metrics?

One suggestion is to be super consistent in the metrics you choose. By consistent I mean stick with them and show changes over time. The changes can be negative, thats ok. No one is delusional enough to expect total positive momentum all the time. Your presentation, of the metrics you choose, will be an investment and in time the board and the C-Suite will start to see the value, and that you are one persistent individual.

Ultimately, there are many superficial security metrics that keep you in a safe zone. I challenge all of us, myself included, to do better and be more creative. This will be difficult but I find it is well worth it. The outcomes may surprise you and the ancillary benefits (areas you will be driven to address, etc) may as well. There will of course be push back that these are difficult to understand. Or maybe the arguments revolve around the effectiveness of the message to the board and the C-Suite. But the fact that something is difficult is no reason to not tackle it.