Predictive Analytics: Understanding Why It’s an Evolving Process

Network failures are frustrating for all and an issue that many IT professionals deal with regularly. Until recently, predictive analytics as it applies to network security has been, at best, a game of guess work. Yet, more advanced tools are emerging that will bring increased accuracy to network reliability forecasts, but only if they are used correctly.

Predictive Analytics 101

Fundamentally, the way that predictive analytics work is by taking historical data, extracting prior trends, and then using this information to project guesses about the future. It’s a way to make informed guesses about the likelihood of an event or events, using trends evident in past events to increase the accuracy of those guesses. In practice, predictive analytics still involves a lot of preparatory work: collecting large amounts of data, exploring the data with various toolkits and visualizations, figuring out which characteristics of the data are important, and then feeding that data into specialized algorithms to train a predictive model.

For example, if you look at historical housing price data you may find that prices have been increasing by 0.5% a month for the past 24 months straight. This is called historical data. Based on this data you can determine that historically, housing prices trend upwards by 0.5% each month. You can use this information to predict that next month, they’re going to go up by 0.5% again. This example is an extremely simple form of predictive analytics. The type of predictive analytics that is deployed in the security industry is much more complicated.

Imagine a network operator wanting to discover activity of a particular botnet on the network. There is a set of traffic captured from bots known to be part of the botnet, and a set of typical, benign traffic seen on the network. The network operator uses the traffic sets to train the predictive model to distinguish between benign and malicious traffic activity. The training correlates characteristics of the collected traffic—protocols, ports, payload, round-trip latency, and so on—and predicts whether the source of the traffic is benign or malicious, adjusting the model continuously depending on the accuracy of the predictions. Ideally, at the end of the training, the network operator has a model that can predict with high confidence whether new traffic seen on the network belongs to the botnet.

Understanding the Drawbacks

Today’s predictive analytic tools have greatly changed the way data can be processed and are now a powerful mechanism for employees to monitor traffic and network attacks. So far, the industry has been quick to put its trust behind these hi-tech solutions, but, little of predictive analytics potential has been realized to date. There is a need for a more strategic approach to big data analytics in security, but there are a few issues standing in the way. The promise is obvious, but we really haven’t gotten to the payoff yet. The biggest issue being limited realization due to the impact that inaccurate predictions can have on network performance.

For a moment let’s consider that you have crafted an analytics model with a 99.99% accuracy in monitoring data, which sounds like a solid percentage. However, when a very large amount of data is processed, that .01% leaves a lot of misclassified data that can lead to attacks. Let’s say 1,000,000 packets per second are flowing through the network. With a malicious traffic detection model maintaining 99.99% accuracy, 100 packets of the 1,000,000 in that single second are misclassified. If even half of those misclassifications triggered an alert, or worse, a change to some network policy, the negative consequences could be very bad for the network as a whole. 50 alerts every second is so much noise that a network operator wouldn’t be able to distinguish the real events from the false. Given those sorts of drawbacks, predictive analytics is still currently best used to just help inform network policy, or to strengthen the confidence of other sorts of event detection.

Staying Proactive is Key

While operational practices might vary from place to place, everyone seems to agree on the need to define and maintain a data ingestion pipeline, ensuring the data most valuable for the organization’s needs is pulled through from the data sources into storage. Without data and ongoing training, predictive analytics has no context and cannot adapt to evolving network usage. However, it seems that most companies are still reactive when it comes to security. They’ve deployed tools and processes that help them deal with an incident once it’s been discovered, but are struggling to apply tools that will be effective at preventing future breaches, or at identifying ongoing threat campaigns.

So, how do you successfully implement predictive analytics? The answer is to have a proactive mindset. The team must be ready to set the data parameters to meet different needs and realize that works today may not work tomorrow. With that information, it’s clear that you will not get the perfect results the first or second time you implement these tools; you must work toward it. Reiterating the data parameters is easier when your team has a “whole stack” approach – for everyone on the team to work together on the data they are collecting and the solutions the business needs. Therefore, the quality of data analyzed is intrinsically linked to the quality of the people capturing it.

To learn more about the operational challenges internet operators face daily from network-based threats and the strategies adopted to address and mitigate them, download  Arbor’s 12th annual Worldwide Infrastructure Security Report (WISR).