The document describes a proposed intrusion/threat detection system with the following key components:
1. A feature engineering module to extract relevant features from organizational data like employee information and online activities.
2. A text processing and topic modeling module to analyze communications data and identify confidential information.
3. An internal threat detection system using deep learning to detect threats in real-time with a risk score and predefined response policies.
4. An external threat detection system using signatures and anomaly detection to enforce actions against external threats.
2. Affine www.affinenalytics.com
Affine Blog
Intrusion/Threat Detection Systems
Recently, when I was reading up on Cyber Security & Threat Detection, I came across “The Annual Data Breach Report by Verizon”. The report
analyzed thousands of such incidents reported by various companies, public & private organizations which happened over the last couple of years.
The report analyzed breaches by firmographics, geographies, industries etc. and found that cyber intrusion is a growing threat to every industry
based in every country of the world. The report proves time and again that “No single industry or organization in the world is safe from Cyber
Threats”. This piqued my curiosity & we felt that we could use all the goodness of data science to effectively tackle this problem. I designed a
Threat/Intrusion Detection System, which could be used to detect such data leaks/breaches & take a preventive action to contain, if not stop the
damage due to breach.
System Architecture
What is Intrusion Detection System?
Wikipedia accurately defines it as“a device or software application that monitors a network or systems for malicious activity or policy violations”.
The detected activities are reported either to an administrator for action or collected centrally using an event management system.
These systems use various methods to track an intrusion or a breach. For eg.t
• Signature Based Detection – Detection of attacks by looking for specific patterns or sequences in network traffic, systems etc.
• Anomaly Detection – Detection of suspicious activity by comparing historical network activity with new activities.
Enablers of Intrusion:
There are multiple reasons/enablers of a data breach. They can be classified into 2 categories:
• Internal Enablers:
a) Compromised Actors: Users whose credentials or devices have been compromised or lost
b) Negligent Actors: Users who expose data accidently by using insecure networks like Wifi etc.
c) Malicious Insiders: Users who intentionally steal data
• External Enablers:
a) Hackers: People who hack an organization’s networks/devices to gain access to sensitive information
b) Phishing: Another type of hacking intended to gain access to user credentials
3. 1. Feature Engineering
An organization’s data can be leveraged to analyze various aspects & behaviors of threat detection.
• Datasets like HR/Employee data to classify Internal vs External threats
• Employees personal information to identify the intent & potential impact of a data breach.
• Online Activities & browsing history to detect external threats
The various hypothesis that can be created to detect breaches like:
• Employees with access to larger number of resources/data streams are more likely to be compromised
• Employees having frequent records of unusual login times and place are a potential threat
• Employees having bad peer reviews are more likely to have a malicious intent etc.
Feature Extraction is a very important step to successfully implement a deep learning system. A comprehensive list of hypothesis & a detailed
exploratory data analysis is a must. A summary of features which I developed is shown below:
The entire system will consist of 6 subsystems/modules in all:
1. Feature Engineering (which can be supported by frameworks like Kafka)
2. Text Processing & Topic Modeling
3. Internal Threat Detection System (Deep Learning based Engine which can be supported by Spark Streaming Framework)
4. External Threat Detection system (Signature & Anomaly Detection Framework)
5. Real Time Alert System
6. Risk Scoring and Reporting
Affine www.affinenalytics.com
Affine Blog
Intrusion/Threat Detection Systems
4. 2. Text Processing and Topic Modeling
Data from various sources can be fed into a text processing engine to identify whether the information in any mode of communication is confiden-
tial or not.
1. Topic Identification to identify the topics of conversation occurring within the network.
2. ployees personal information to identify the intent & potential impact of a data breach.
3. Context Identification to determine whether the parties sharing confidential information have relevant authority & permissions and to
identify if the data is related to their work or not.
Below is a working example of how the text processing & classification engine would identify various categories of information based on text
processing & origin vs destination of the information.
Affine www.affinenalytics.com
Affine Blog
Intrusion/Threat Detection Systems
5. 3. Deep Learning based Threat Detection System
Current methods & technologies are not efficient at detecting APT’s (Advanced Persistent Threats – mutations of viruses & malware). With the ever
changing technology, we are witnessing new ways of intrusion & breaching confidential data. This entails that the system should be self- learning.
A Multi-layered deep learning based system could not only help capture complex interactions between variables but it is very robust, scalable &
adaptable. All the identified incidents & patterns are denoted by a risk score, to help investigate the breach, control data loss and take precaution-
ary actions for future
4. Internal Threat Detection System
The system proposed above could potentially detect a breach within 20 seconds of the event.
Feature creation from the collected event data will happen in real time on a rolling basis. The data is then fed into the deep learning engine
discussed above to detect threats. Various policies & pre-decided actions can then be undertaken by the system based on the severity of the
detected breach.
Affine www.affinenalytics.com
Affine Blog
Intrusion/Threat Detection Systems
6. 5. External Threat Detection System
The signatures & input nodes for external threat detection are based on files transferred, processes, network accesses, domain & IP, logs, devices
etc. A ruleset based framework could help the system to enforce pre-decided action in case of identification of a data breach.
6. Risk Management
For continuous risk management, a risk score based framework could be developed to identify risky assets & employees and to help mitigate these
risks in a timely manner. A risk score could be developed by using all the below attributes to help identify the potential sources of the breach.
Affine www.affinenalytics.com
Affine Blog
Intrusion/Threat Detection Systems
7. Conclusion:
We have seen from various data breaches likeYahoo data breach & Sony Email Fiasco that cyber security is a real threat to the identity & reputation
of an organization. It’s imperative that adequate security measures should be put in place to mitigate the risks posed by it. Given new modes of
communication, more sophistication in malware & new patterns of hacking, a framework like one discussed above could be implemented to
prevent theft of digital information.
Affine www.affinenalytics.com
Affine Blog
Intrusion/Threat Detection Systems