Three Steps To Getting Better Security ROI: When Fast Times Clash with Lean Reality

Posted by Rich Seiersen on June 19, 2020
Rich Seiersen
Find me on:

Using Simple Predictive Analytics to Beat the Odds


It Was The Best Of Times....

I think we all knew leaner times were coming, but who would’ve thought a pandemic would hasten it? Now, investments are shrinking, workforces are being reduced, and my security super-friends are fearing their budgets will be slashed. 

Making matters worse for cash stretched defenders, many enterprises are doubling down on digital transformation. Digital transformation originally meant moving legacy workloads to the cloud. Today it means moving to cloud native development. Cloud native aims to increase software velocity exponentially while keeping costs contained. At least, that’s the hope!

As a CISO I experienced this transition first hand. I went from a large public company doing hundreds of software releases a year to a smaller firm doing tens of thousands releases per year. Though the latter company was 100% cloud native with 1000% fewer resources, there was no comparison in terms of efficiency. The cloud native company was crushing it.

Cloud native development is providing the velocity and efficiency that businesses need and developers want. In a world looking to drive out costs, such efficiency is a matter of survival. In this new reality where risk exposure goes up and investments in protections go down, what’s a security leader to do?

Get Better ROI For Security

In this blog post, I will modernize an approach for getting better security ROI. This approach was first outlined inHow To Measure Anything In Cybersecurity Risk.” Here, modernization means using code over Excel and real data on top of expert forecasts. Some eagle-eyed readers might point out that modernization looks like “data science.” To that, I quote my co-author Doug Hubbard: What science isn’t data science?” Anyway, don’t worry if you’re not into coding and math. I will mostly focus on approach over implementation. 

For those of you brave enough to walk on the wild side, here is the code in all its glory. Apologies in advance for the bugs (software or statistical). You will want to install R and Rstudio. Also, the code embeds markdown, which you can run if you like (instructions therein) to get a prettier web format output. Copy and paste into Rstudio, and then click the “source” drop down in the upper right of RStudio. Super simple! All you need to do is play with the configs, which I explain below.

Buying Security With Predictive Analytics

I’m proposing a simple approach for making faster and more informed security buying decisions. I call it ABC A/B testing, short for Approximate Bayesian Computation A/B Testing. (Heavy emphasis on the word approximate.) It sounds scarier than it is! 

ABC A/B testing uses subject matter expert (SME) beliefs and small sample data to compare product effectiveness. It specifically forecasts costs based on “proof of concept” (POC) outcomes, financial inputs, and plenty of uncertainty. 

I base the model on my experience as a CISO and security buyer. I, like my CISO peers, ran countless POCs for new purchases. I felt we could do these faster and more “scientifically.” This article and the associated code is for those of you who feel the same. The ideal user wants better products but has little time to test, never enough resources, and not enough data to make an informed decision.

Our first step is to turn your SME beliefs into data. If that sounds weird, don’t worry. There’s a lot of research on this topic. Our book goes over SME belief codification and the supporting research in detail.

The Use Case: Web Application and API Scanning 

Note: this method can work with just about any product you are evaluating. If the particular use case doesn't float your boat, substitute your own alternative.

Imagine you have a solution that scans web applications and APIs for vulnerabilities. These vulnerabilities are the kind that invariably escape out of development and get exposed to users and the bad guys. Let’s explore some reasons you are not happy with your current product's performance.

First, your product produces too many false positives (FPs). FPs create lots of wasted work for your team. And the ones that get past your team to be later discovered by development are particularly annoying.

Then there’s the false negatives (FNs). FNs create emergencies, particularly when a remotely exploitable vulnerability is discovered in production. Engineers have to stop what they are doing to remediate these – prior to the bad guys exploiting them. Additionally, security teams may have to do forensics just to be sure the bad guys didn’t steal any treasure.

No matter the reason, you’re in the market for a new product. Your goal should be to compare your existing solution to one or more alternatives, paying particular consideration to the financial impact of errors (FPs and FNs). Below, I will explore a three step-process for comparing your solutions!

Step 1: Model Your Beliefs

Let’s start with two reasonable assumptions. First, you can measure the rate with which you find vulnerabilities. This includes true vulnerabilities, FPs and FNs. The second assumption is that you already have some information about the underlying error rate even before you see the data. You wouldn’t be in the market for a solution if you didn’t have at least a quasi informed opinion about this. 

To quantify your beliefs, you will require two numbers. (I will define these technical terms via the example dialogue.) The first is the median error rate. The second is the 90% boundary rate. An informed practitioner might respond, “Our median error rate is around 25%. That means I believe the true error rate is just as likely to be below 25% as it is to be above. In terms of my 90% boundary rate, I’m 90% confident the true rate is below 45%. After all, I have never in my life experienced a 50% error rate – but it might happen!”.

Code snippet

We use those numbers to create a graph of all possible rates given those assumptions. Check out the graph below! Any rate under the purple curve is possible given your SME beliefs. But, the ones between the two lines are more probable. The ones near the apex are the most plausible given your data, assumptions, and this basic model.


Credible Beliefs table


If this graph could speak it would say, “Given your beliefs about the rates and the particular model you are using, you should expect the true rate is likely between 6% and 56%.” This model allows for a lot of uncertainty about plausible error rates.

Step 2: Mash up Data With Beliefs

Next, we are going to include real data. Real data is data that comes from the tools and processes you are running. We will literally combine the real data with the belief data (the purple curve). “Why on earth would you do that!?” you say? In short, we have small and noisy data - that’s why! 

I use the words “tools” and “process” together to refer to “capability”. In this exercise, we are measuring a capability. It’s difficult to isolate a technology from its implementation, but we don’t let that stop us! We are interested in better outcomes over perfect measurement.

Small data can refer to zero to hundreds of events. (In my previous book we focused exclusively on zero events...all beliefs!) But your company may be a Netflix, Uber, AWS, etc., where you get many thousands of unique “vulnerability” events a day that interest you. That’s fine! This model would still work. I suspect most readers are dealing with much sparser data sets.

When I say noisy data I mean highly uncertain and incomplete data. Even 50 to 100 events will still leave you uncertain. You may wonder, “Is this the real false positive rate? How can I be sure this is the false negative rate!? Was there a change in the environment we were scanning that I didn’t know about?...” We call all that stuff noise and uncertainty. You use your SME beliefs to help control, and inform, the noise when data is small and incomplete. 

Now let’s capture your product data: 

  • The first configuration item below is the number of times you will run this analysis: 100,000. Running thousands of tests is another tactic for dealing with small and noisy data. 
  • The next two numbers are the distinct count of “critical vulnerabilities” found in a particular time period per product. Let’s assume you ran two scanners against the same environment for 30 days. At the end of the period you see that one product found 30 unique “critical vulnerabilities” and the other product found 37. 
  • Since you ran the test for a month, and we are looking to do a year long cost forecast, you set the “time_multiple” variable to twelve. If you ran your test for a day, then this number would be set for 365. If it ran for a week, then 52, etc. 


In the box below you input the errors of each product in relationship to their total events above. Errors are the sum of false positives and false negatives. Unfortunately, even the process of finding errors is ironically error prone. This is why your beliefs about events matter in this particular type of analysis. We are assuming small, messy and noisy data. Again, don’t let the myth of perfect data halt measurement!


Using this data we can now simulate 100,000 error rates. A sample of those rates for each product can be seen in the proportion_events column in the next table below. We get those rates from the “purple beliefs” curve above. The curve “thinks” errors are most likely (but not exclusively) between 6% and 56%. 

Using the total event counts of 30 and 37 for each product, we can generate error counts given the proportion_events. On the first row on the left, the model asks “Given an error rate of ~16% and a total of 30 events how many errors might I get?” This happens 100,000 times given the various proportion_events. The outcomes of each experiment are stored in the a_n_errors and b_n_errors columns respectively.


Now this is where the ABC part comes in. From the above results, we filter anything that doesn’t match real error counts of 10 and 13.


The point of view with Approximate Bayesian Computation is that there is an underlying random process that generates those specific error counts (10,13). That process has some constraints that will generate various “proportion_events.” We are collecting all of those proportions that generated the exact error counts.   

Next we take these results and draw 10,000 random samples from each. I am going to assume that some of my readers may not be familiar with random sampling. In short, a sampler will randomly select rates from each of the tables above. The sampler is biased to select data in proportion to how frequently it occurs. Sampling also adjusts for noisy and incomplete data. 

We can use the sampled data to answer some questions about the underlying data-generating process. For example, what is the most likely range of rates that produced the counts we are seeing? And if we had to choose one number to represent the rate what might that number be?

Screen Shot 2020-06-18 at 2.17.50 PM

Based on everything we have done to this point, our error rate model thinks there is a 56% chance that product B produces more errors than product A. Translated to English: you can bet on Product B to produce more errors in the long run. Either way, that is not a really big difference, and there is a lot of uncertainty here. Let’s graph these results:


Product AB testing graph


Here is a slightly different view of the same data superimposed right on top of each other:


AB superimposed


Below is yet another view from another angle. This graph (shown second) is the result of subtracting Product B’s error rate from Product A. Visually, it’s a little less informative because the difference is so small. In short, Product B seems to make more errors. You can see this on the graph as more of the data is shifted to the left. The table below may help clarify what is going on. The more events that are in the negative shifts the graph further to the left because product B has higher error rates. If, on the other hand, things are shifted to the right it means that product A has higher rates:


Names of rates


Event Amount Difference


Step 3: Forecast The Financial Impact of Errors

This particular part of the model is purposefully simple. The first thing I did was input a range of hours for addressing scan errors. The range of hours takes into consideration both security and developer time. 

I don’t go into detail here, but there are methods we can use to capture time data more empirically. It has a fancy name, it’s called “Empirical Bayes.” Presupposing you have ticketing data in Jira or Service Now we could model historical ticketing rates and time spent. I cover these approaches in my next book.



Next, I capture the engineering cost. This cost considers the possibility of multiple people being involved - typically one security and one software engineer. You can adjust this to what you think is reasonable. Again, while this cost usually hovers around six hundred, it could range from two hundred or well over two thousand dollars. 


What happens in the model is that hours and costs get multiplied together. That product is then multiplied by the event probability. And that final product is the “expected value” of the impact. Expected value is a number frequently used for cost modeling. It is a single number representing one of numerous possible impacts. 

Here is a small sample of the data. In the left column is the event probabilities. On the right is the expected value of the impact given the product error rate probabilities.


Now we can overlay the distribution of costs per event onto one another. As you can see, they don’t seem all that different, but Product B seems to edge out A. 


Now that we have a distribution of costs, we can do some simple aggregation to forecast the difference between Product A and Product B over time. We also need to add the licensing costs of each product and any available data on operational costs, though the latter is optional.


In the image above, you can see the program takes all of this information in and outputs the following:

  • Product A is 90.8% the total cost of product B. 
  • Product A (with base price of $72,000 ) is expected to cost to operate with errors: $148,920 a year. 
  • Product B (with base price of $65,000 ) is expected to cost to operate with errors: $163,947.  

This is not that motivating right? I mean, that cost difference is a rounding error for most large companies. But, what if we shifted this measurement to a weekly model? That would likely be more realistic for a fortune 5000 company knee deep in digital transformation. All you need to do is change the “time_multiple” variable to 52.


Now we get the following:

  • Product A is 81.1% the total cost of product B
  • The expected operation cost of Product A with errors(with base price of $72,000): $372,776 
  • The expected operation cost of Product B with errors(with base price of $65,000): $459,551 

Just for kicks, let’s assume your company is one of those that is doing tens of thousands of releases a year, just like my first cloud native CISO gig. Let’s further assume that our sample was for one day only. In order to simulate a year's worth of data, all I need to do is change the “time_multiple” parameter to 365. I doubt anyone would do a one day POC, but for illustration purposes, 

  • Product A is 76.2% the total cost of product B
  • The expected operation cost of Product A with errors(with base price of $72,000): $2,113,258 
  • The expected operation cost of Product B with errors (with base price of $65,000): $2,772,064 

This is of course a “what-if” scenario based on a small scanning sample. It’s only a model! That means “not real!”  You likely would not experience losses like this as if you were writing a check for millions of dollars. Reporting outcomes as expected values simply provides a consistent mechanism for comparing two events that vary by probability, much like how we use net present value to make a consistent comparison between cash flows that vary in time.

It’s here that I remind you of George Box’s famous phrase (with a little sauce from Doug Hubbard): “All models are wrong, but some are useful...and some are measurably more useful than others.” 

Our goal was to create a mathematically unambiguous and consistent method of comparing products. We have also primed the model with our vague beliefs about actual error rates and costs. We munged all that together with actual error rates to help us make product comparisons over a one year period. 

Perfect? No! Useful? Yes! It is particularly useful when compared to the competition: your unaided intuition. That model usually goes by two names: “Wild Arse Guesses” and “I’ll Buy The Shiny One.” 

What’s Next!?

What we have demonstrated here is a simple scoring system for comparing products under POC. These analytics can be applied to any solution that has a binary outcome, i.e. hit and miss, event and non-event, exploit and non-exploit, escalate and non-escalate, etc. They can be used with events that occur in a bounded period of time: day, week, month and etc.

The goal is to compare capabilities using small data, and then forecasting costs. You can ultimately think of this model as a means for scoring how capabilities may perform over time – and then expand upon it. For example, it can be a real-time learning system that alerts you to possible drift in your security services’ performance. 

That type of constant learning, informed by our uncertainty and incomplete data, is where the Bayesian in ABC comes in. Bayesian methods constantly update our beliefs about reality as data rolls in. Bayesian methods support including our prior beliefs (or prior data) about processes. The more data we have, the less reliance there is on subjective beliefs (aka SME forecasts).

This type of modeling can easily evolve into an approach for real-time security capability optimization and decisioning, which is something I am particularly interested in. It also happens to be a topic covered in my next bookThe Metrics Manifesto: Confronting Security With Data.” 

I’m very interested in hearing from folks who might want to see analytics like this operationalized. After all, my company Soluble is in the operationalization game. We deliver SaaS services for making cloud native security as simple and cost effective as possible for front-line practitioners. We are actively seeking design partners (individuals and or teams) to help. We provide the cloud environment so you can play safely. You don’t need to provide anything other than a small amount of your time. Sounds like fun? Let me know:

You can learn more by downloading our Whitepaper: Operationalizing Security With Kubernetes. 


Topics: Kubernetes, Cloud Native, CISO, DevSecOps, Security Metrics

Rich Seiersen

Written by Rich Seiersen

Rich is Cofounder and CEO for Soluble. Prior to Soluble, Rich spent 20 years deep in the salt mines of security operations and development. Along the way, he became a serial CISO with stints at LendingClub, Twilio and GE. But he got his start in security startups building vulnerability management products for companies like Qualys and Tripwire. He’s also the co-author of “How To Measure Anything In Cybersecurity Risk,” and the forthcoming “The Metrics Manifesto: Confronting Security With Data.”