Blackhat Europe Finding Needles in Haystacks (the size of countries)
At Blackhat Europe 2012 I unleashed the subject of Big Data Security Analytics and Network Security Monitoring. The presentation was “Finding Needles in Haystacks (the size of countries)” and you can find the slides on Slideshare or download the [PDF].
I knew the audience wouldn’t be familiar with Big Data technologies such as Map/Reduce, Hadoop and Pig but they have a keen sense for the changing nature of attacks – that they are becoming more subtle, complex, blended and frequent. We only need to look at 2011 and the major companies that were exploited in that year.
During the talk I showed the “Let’s Enhance” video and stated that it was a good metaphor for security analysis. It juxtaposes the hollywood detective with our understanding of the real world. In terms of Security it makes you think of the context you need to find structured attacks against your network. In security we are dealing with a problem of scale and accuracy. Charged with finding needles in haystacks we can barely correctly capture security events. This is why the video is so funny.
These Hollywood ‘analysts’ have almost magical tools that afford them capabilities we could only dream of as security analysts. They are –
- Enriching data when we constantly face a loss of resolution and fidelity.
- Playing, Pausing and rewinding events but we have one chance and then it’s gone.
- Exploring data in vector space, building context and entropy but we are looking at isolated and disconnected events.
- Focused on detection however the security industry is still heavily focused on prevention.
- Investigating events after they have happened but we are geared towards preventing an unknowable future state.
- Operating on a complete copy of the event when the best we can often summon is a log or correlated log store.
- Using algorithms to process features and vectors from data which is a subject that is not even being looked at in terms of security.
So I proposed taking the core concepts from Network Security Monitoring (NSM) and combining it with Full Packet Capture (FPC) and Big Data tools to provide the ability to investigate incidents at mass scale.
Packetpig can analyse packets at terabyte scale. The data analysis language (like a query language) of Pig lends itself nicely to exploring terabytes of full packet captures. The beauty of Packetpig is you can write a query on your laptop against a small sample of data and then execute the query on the cluster against months or years of traffic captures. Packetpig also comes with a large number of examples.
Packetpig is the first Big Data security tool, it’s open source and available for anyone to use. It combines big data analysis with some pretty stunning visualisations. I demonstrated a number of these during the presentation. They included the Google WebGl Globe displaying 420,000 snort alerts across approximately 12 days of full packet captures. I also demonstrated the full capabilities of Packetpig in the areas of threat analysis, traffic analysis and payload analysis including an awesome way of visualising trigrams using an NGram Cube. All of these features will be showcased on the blog over time.
Analysing large data sets gives security analysts new capabilities and this was demonstrated towards the end of the presentation when I used BitTorrent seeders and leechers to triangulate the source of attacks to confirm what IP addresses were common to individual attackers. This involved finding distinct attackers out of 420K individual events (3 Billion packets) and matching it to 180,000 Seeders and Leechers we tracked across Piratebay’s Top 100 Movies, Music and Books.
Thanks to everyone that attended the briefing and also those who stayed back to ask questions, discuss their own situations and problems and the capabilities of Big Data Security Analytics.
You can follow @packetpig on Twitter but also download and use the code on your own traffic captures!