2. Agenda
• Intro, who I am.
• Cybersecurity
• ONI now Apache Spot (incubating)
• Apache Spot (incubating)
• Demo
• Call to Action.
• Q&A
3. Cybersecurity
• We have gaps… The analysis of billions of events , orchestrate our
data sources (logs in different forms), and sometimes the
documentation of our security products is not the best.
4. The hacker community collaborates everyday, it’s time we
do the same.
Services Products Training
Free$100
Learn to Crack
Wifi
Hack a Corporate
Email Account
Angler
Exploit Kits
$500
6. ONI -> Apache Spot (incubating)
• Apache spot (incubating) is an advanced analytic solution that will help us to
close the gaps that we are mention on the previous slides.
• Ingesting billions of records in HDFS and execute machine learning algorithms, to
detect potential threats in our environment.
7. Apache Spot (incubating) Core
ONIDataSources
DNS
Infrastructure Logs
Proxy
Infrastructure
Logs
Routers with
Netflow Protocol
Enabled on
Interfaces
New Data Source
New Data
Source
ONI
Visualization
Server / iPython
Server
ONIGUI
Security and
Context Use
Cases Develop in
Conjunction with
Intel Security
Assumes Cloudera Hadoop Environment
Data Integration Data Store Machine Learning
Collectors
Online
NoSQL
(HBase)
Filesystem
(HDFS)
Spot ML
Algorithms
Spark
Master Node (s) Cloudera
Manager/Navigator
Machine learning
Algoritms Output,
ONI Recommended
the Intel MPI
Libraries. Scala
Native Administration
Cloudera Manager
Cluster Authentication
LDAP/Kerberos
Authentication
Machine learning
Generates CSV Files
with the Results
Operational Analytics Adding
Context Using Reputation Services
for Public IP Address (GTI)
Defining the
Interface to Share
the Suspicious
Connections with I-
Sec Products and
Other Brands.
Product Architectural Overview
8. Apache Spot delivers…
1. Scalable Data and Analytics Platform
2. Open Data Models
3. Analytic Collaboration Across the Community
4. Growing Application Ecosystem
… to address cybersecurity use cases.
• Network Traffic Analytics
• Threat Hunting
• Incident Detection and Resolution
• Cybersecurity Data Management
• Custom Use Case
9. PlatformApache Spot, bringing all of the components together.
DataManagement
Apache Spot Sample Data Sets
Apache Spot Open Data Models (ODM)
Data Platform (CDH)
Ingestion (Kafka, Flume, Streamsets)
Analytics
Apache Spot OSS Analytics
Analytic Services (Jupyter, Apache Spark)
App
s
Apache Spot ODM Marketplace
Infra
Intel Hardware, On-Prem, AWS, Azure
Management,Security,Governance
(Director,Manager,Sentry,Navigator)
Public or private clouds
Scalable storage and distributed processing
Provisioning, management, and security
Batch and stream data ingestion
Logical and physical models
Data Science workbench
Network traffic analytics, Add’l OSS analytics
ODM Compliant ecosystem, both open source and
ISV
Community sourced, anonymized data sets for
model development
11. Call to Action.
Contribute for the Apache Spot (incubating) project.
1. Develop connectors to ingest more data
2. Develop new algorithms that help us to increase the detection rate of the tool
3. Contribute to add Context to this results, adding threat intelligence feeds
connector to databases that will help us to present meaningful information to
the end user.
4. Develop the User Interface, propose changes, technologies, operational
summaries, reports, etc.
12. Call to Action.
5. Integrate Apache Spot (incubating) with other security tools, that have the
capabilities to enforce / change security postures. (Firewall consoles, IPS
consoles, Proxies, Endpoint Security Solutions, E-mail proxies)
6. Contac us
• Web page: http://spot.apache.org/
• slack: slack.apache-spot.io/
• twitter @ApacheSpot
7. Contribute to the Apache Spot (incubating) project.
13. With Apache Spot, you are joining a community.
Collaborate with industry leaders using a common
framework.
Rules and patterns most of the time on the cyber side..
DDoS, The internet apocalypse map hides the major vulnerability that created it… China stuff
https://cloudera.my.salesforce.com/06934000001jGcw
Hire a hacker - Hack corporate email account without them knowing or needing to change the password. Hacker can then forgot password and reset password to critical applications.
Buy a product that helps you hack - Angler exploit kits help infect users with malware. The malware is delivered to the user when they visit a site that has the kit deployed on it.
Get trained by the best hackers on Youtube – Anyone can know learn how to hack a corporation.
Enterprise System Information Protocol (ESIP)
For reporting of asset inventory information. Common Platform Enumeration (CPE), etc.
Threat Analysis Automation Protocol (TAAP)
For reporting and sharing structured threat information. Malware Attribute Enumeration & Characterization (MAEC), Common Attack Pattern Enumeration & Classification (CAPEC), Common Platform Enumeration (CPE), Common Weakness Enumeration (CWE), Open Vulnerability and Assessment Language (OVAL), Common Configuration Enumeration (CCE), and Common Vulnerabilities and Exposures (CVE).
Event Management Automation Protocol (EMAP)
For reporting of security events. Common Event Expression (CEE), Malware Attribute Enumeration & Characterization (MAEC), and Common Attack Pattern Enumeration & Classification (CAPEC).
Incident Tracking and Assessment Protocol (ITAP)
For tracking, reporting, managing and sharing incident information. Open Vulnerability and Assessment Language (OVAL), Common Platform Enumeration (CPE), Common Configuration Enumeration (CCE), Common Vulnerabilities and Exposures (CVE), Common Vulnerability Scoring System (CVSS), Malware Attribute Enumeration & Characterization (MAEC), Common Attack Pattern Enumeration & Classification (CAPEC), Common Weakness Enumeration (CWE), Common Event Expression (CEE), Incident Object Description Exchange Format (IODEF), National Information Exchange Model (NIEM), and Cybersecurity Information Exchange Format (CYBEX).
Enterprise Remediation Automation Protocol (ERAP)
For automated remediation of mis-configuration & missing patches. Common Remediation Enumeration (CRE), Extended Remediation Information (ERI), Open Vulnerability and Assessment Language (OVAL), Common Platform Enumeration (CPE), and Common Configuration Enumeration (CCE).
Enterprise Compliance Automation Protocol (ECAP)
For reporting configuration compliance. Asset Reporting Format (ARF), Open Checklist Reporting Language (OCRL), etc.
In more detail, LDA represents documents as mixtures of topics that spit out words with certain probabilities. It assumes that documents are produced in the following fashion: when writing each document, you decide on the number of words N the document will have (say, according to a Poisson distribution).
Choose a topic mixture for the document (according to a Dirichlet distribution over a fixed set of K topics). For example, assuming that we have the two food and cute animal topics above, you might choose the document to consist of 1/3 food and 2/3 cute animals.
Generate each word w_i in the document by:
First picking a topic (according to the multinomial distribution that you sampled above; for example, you might pick the food topic with 1/3 probability and the cute animals topic with 2/3 probability).
Using the topic to generate the word itself (according to the topic’s multinomial distribution). For example, if we selected the food topic, we might generate the word “broccoli” with 30% probability, “bananas” with 15% probability, and so on.
Assuming this generative model for a collection of documents, LDA then tries to backtrack from the documents to find a set of topics that are likely to have generated the collection.