6. ONI -> Apache Spot (incubating)
• Apache spot (incubating) is an advanced analytic solution that will help us to
close the gaps that we are mention on the previous slides.
• Ingesting billions of records in HDFS and execute machine learning algorithms, to
detect potential threats in our environment.
7. Apache Spot (incubating) Core
New Data Source
Server / iPython
Cases Develop in
Assumes Cloudera Hadoop Environment
Data Integration Data Store Machine Learning
Master Node (s) Cloudera
the Intel MPI
Generates CSV Files
with the Results
Operational Analytics Adding
Context Using Reputation Services
for Public IP Address (GTI)
Interface to Share
Connections with I-
Sec Products and
Product Architectural Overview
8. Apache Spot delivers…
1. Scalable Data and Analytics Platform
2. Open Data Models
3. Analytic Collaboration Across the Community
4. Growing Application Ecosystem
… to address cybersecurity use cases.
• Network Traffic Analytics
• Threat Hunting
• Incident Detection and Resolution
• Cybersecurity Data Management
• Custom Use Case
9. PlatformApache Spot, bringing all of the components together.
Apache Spot Sample Data Sets
Apache Spot Open Data Models (ODM)
Data Platform (CDH)
Ingestion (Kafka, Flume, Streamsets)
Apache Spot OSS Analytics
Analytic Services (Jupyter, Apache Spark)
Apache Spot ODM Marketplace
Intel Hardware, On-Prem, AWS, Azure
Public or private clouds
Scalable storage and distributed processing
Provisioning, management, and security
Batch and stream data ingestion
Logical and physical models
Data Science workbench
Network traffic analytics, Add’l OSS analytics
ODM Compliant ecosystem, both open source and
Community sourced, anonymized data sets for
11. Call to Action.
Contribute for the Apache Spot (incubating) project.
1. Develop connectors to ingest more data
2. Develop new algorithms that help us to increase the detection rate of the tool
3. Contribute to add Context to this results, adding threat intelligence feeds
connector to databases that will help us to present meaningful information to
the end user.
4. Develop the User Interface, propose changes, technologies, operational
summaries, reports, etc.
12. Call to Action.
5. Integrate Apache Spot (incubating) with other security tools, that have the
capabilities to enforce / change security postures. (Firewall consoles, IPS
consoles, Proxies, Endpoint Security Solutions, E-mail proxies)
6. Contac us
• Web page: http://spot.apache.org/
• slack: slack.apache-spot.io/
• twitter @ApacheSpot
7. Contribute to the Apache Spot (incubating) project.
13. With Apache Spot, you are joining a community.
Collaborate with industry leaders using a common
Rules and patterns most of the time on the cyber side..
DDoS, The internet apocalypse map hides the major vulnerability that created it… China stuff
Hire a hacker - Hack corporate email account without them knowing or needing to change the password. Hacker can then forgot password and reset password to critical applications.
Buy a product that helps you hack - Angler exploit kits help infect users with malware. The malware is delivered to the user when they visit a site that has the kit deployed on it.
Get trained by the best hackers on Youtube – Anyone can know learn how to hack a corporation.
Enterprise System Information Protocol (ESIP)
For reporting of asset inventory information. Common Platform Enumeration (CPE), etc.
Threat Analysis Automation Protocol (TAAP)
For reporting and sharing structured threat information. Malware Attribute Enumeration & Characterization (MAEC), Common Attack Pattern Enumeration & Classification (CAPEC), Common Platform Enumeration (CPE), Common Weakness Enumeration (CWE), Open Vulnerability and Assessment Language (OVAL), Common Configuration Enumeration (CCE), and Common Vulnerabilities and Exposures (CVE).
Event Management Automation Protocol (EMAP)
For reporting of security events. Common Event Expression (CEE), Malware Attribute Enumeration & Characterization (MAEC), and Common Attack Pattern Enumeration & Classification (CAPEC).
Incident Tracking and Assessment Protocol (ITAP)
For tracking, reporting, managing and sharing incident information. Open Vulnerability and Assessment Language (OVAL), Common Platform Enumeration (CPE), Common Configuration Enumeration (CCE), Common Vulnerabilities and Exposures (CVE), Common Vulnerability Scoring System (CVSS), Malware Attribute Enumeration & Characterization (MAEC), Common Attack Pattern Enumeration & Classification (CAPEC), Common Weakness Enumeration (CWE), Common Event Expression (CEE), Incident Object Description Exchange Format (IODEF), National Information Exchange Model (NIEM), and Cybersecurity Information Exchange Format (CYBEX).
Enterprise Remediation Automation Protocol (ERAP)
For automated remediation of mis-configuration & missing patches. Common Remediation Enumeration (CRE), Extended Remediation Information (ERI), Open Vulnerability and Assessment Language (OVAL), Common Platform Enumeration (CPE), and Common Configuration Enumeration (CCE).
Enterprise Compliance Automation Protocol (ECAP)
For reporting configuration compliance. Asset Reporting Format (ARF), Open Checklist Reporting Language (OCRL), etc.
In more detail, LDA represents documents as mixtures of topics that spit out words with certain probabilities. It assumes that documents are produced in the following fashion: when writing each document, you decide on the number of words N the document will have (say, according to a Poisson distribution).
Choose a topic mixture for the document (according to a Dirichlet distribution over a fixed set of K topics). For example, assuming that we have the two food and cute animal topics above, you might choose the document to consist of 1/3 food and 2/3 cute animals.
Generate each word w_i in the document by:
First picking a topic (according to the multinomial distribution that you sampled above; for example, you might pick the food topic with 1/3 probability and the cute animals topic with 2/3 probability).
Using the topic to generate the word itself (according to the topic’s multinomial distribution). For example, if we selected the food topic, we might generate the word “broccoli” with 30% probability, “bananas” with 15% probability, and so on.
Assuming this generative model for a collection of documents, LDA then tries to backtrack from the documents to find a set of topics that are likely to have generated the collection.