Packetpig – Open Source Big Data Security Analysis

Packetpig is an open source project hosted at Github that allows full packet captures and device logs to be analysed. We describe it as Big Data Security Analysis – a way of analysing and applying Network Security Monitoring principles to big datasets.

Packetpig is made up of a series of Pig Loaders (Java Classes) that exposes packets captures so they can be analysed at massive scale;

  • PacketLoader() – opens packet captures and provides access to TCP, UDP and IP headers e.g. Source IP Address, Source Port, Destination IP Address, Destination Port.
  • SnortLoader() – wraps the Snort Intrusion Detection application allowing packet captures to be analysed across a Hadoop Cluster. The loader analyses packets and returns signature, priority, message, protocol and Source IP/Port, Destination IP/Port.
  • ConversationLoader() – links packets to their conversations or flows. The conversation start and end, the way the conversation ended, the number of packets, their size and delay can all be extracted through this loader.
  • DNSConversationLoader() – provides additional functionality for the deep packet inspection of DNS conversations.
  • HTTPConversationLoader() – provides additional functionality for the deep packet inspection of HTTP conversations.
  • ConversationFileLoader() – allows file metadata and files themselves to be extracted from conversations. The file name, extension, libmagic information as well as MD5, SHA1 and SHA256 hashes are returned through this loader. In addition the actual files themselves can be extracted and dumped.
  • FingerprintLoader() – a wrapper for p0f that allows it to operate across a Hadoop Cluster.
  • PacketNgramLoader() – extracts data from each packet in a conversation and breaks it into an N-Gram. Unigram, Bigram and Trigrams are most commonly used however any integer can be passed to the loader.
Google WebGL Globe of Snort Alerts

Loaders are called in Pig files written in PigLatin. Multiple loaders can be used to analyse data. For example you may want to take all sources of attacks and see whether their operating system matches their user agent. This would involve using the SnortLoader(), FingerprintLoader() and HTTPConversationLoader().

Firstly you would parse all packet captures using the SnortLoader() to find the distinct Source IP addresses linked to Snort attacks. Secondly you would parse all packet captures using the FingerPrintLoader() (a wrapper for p0f) that would provide information on the operating system using passive analysis. Thirdly you would parse all HTTP conversations using the HTTPConversationLoader() to extract the User Agent field from all conversations. Finally you would join the data together on the Source IP address to output the analysed data linking attackers to their operating systems and their user agents.

SSH Trigrams Visualised in 3D Space

The Packetpig Loaders are the building blocks for analysing full packet captures. There is nothing stopping you from also integrating device log files if required. The Packetpig project also includes a 3D Globe, World Maps and Line Graphs for time series and NGram visualisation.

All of us at Packetloop hope you enjoy the project and we are happy to accept pull requests if you wish to contribute.

Comments are closed.