How to Troubleshoot Apps for the Modern Connected Worker
Analyzing the World's Largest Security Data Lake!
1. Hadoop Summit 2015
CloudFire Analytics: Transforming Security
Analyzing Symantec’s security data lake
Stephen Brodsky and Darrell Kienzle
2. Hadoop Summit 2015
Outline
CPE CloudFire: Analytics and Products1
Analytics Services and Data2
Analytics Administration and Monitoring3
Self-Service Analytics and Dynamic Clusters4
Symantec CloudFire Analytics 2
3. Hadoop Summit 2015
CPE CloudFire: Analytics and Products
• CPE – Cloud Platform Engineering
– Symantec’s Private Cloud for security products and analytics
– Spans 50+ data centers around the world
• CloudFire – CPE’s scalable cloud platform
– New data centers, new hardware, scalable build-out
– All open source
– OpenStack for virtualization
– Analytics for big data analysis
• Cloud for bringing together and integrating Symantec’s:
– Products
– Big Data, Analytic applications, and Services
– Compute, Network
Symantec CloudFire Analytics 3
4. Hadoop Summit 2015
Analytics for supporting Security Products
Goals of the Security Teams
1. Do we have the data?
2. Can we analyze the
data?
3. Can we provide timely
Insights?
CloudFire Analytics for supporting Security
1. Data available
• All frequently used Security Data available
• Data at scale: Data available for parallel analysis
(PB scale)
• Leveraging CPE Data Center Availability,
Compute, Net
2. Analysis engines
• Hadoop ecosystem engines (MR, Hive, Kafka,
Spark, HBase, Storm, Phoenix, ++)
• Analytics Pipeline
3. Analysis in near real-time
• Analytics timescale days/hours -> seconds
• Batch -> Streaming
4Symantec CloudFire Analytics
5. CPE Analytics Architecture Overview
Inbound Messaging
(Data import, Kafka)
Products
Distributed Storage (HDFS), Metal or Virtual (OpenStack) Servers
Analytic Applications, Workload Management (YARN)
Stream Processing
(Storm, Spark)
Real-time Results
(HBase, ElasticSearch)
Query
(Hive, Spark SQL)
Device
Agents
Telemetry, Data
Threats: Top Web-based
Attacks
192 web-based attacks recorded last month
36%
30%
14%
10%
7%3% Malvertisement
Exploit Kit
Suspicious
Download
Analytics Clusters
• All major open
source engines
• PB-scale Data Store
• Key Telemetry
• Multi-Data Center
• Administration
• Monitoring
• Prod, Dev/Test
• Application
Deployments
CloudFire Analytics
Data Transfer
5
6. Hadoop Summit 2015
A key security Product
Symantec CloudFire Analytics 6
• Live with hundreds of external customers
• Improved time to analyzed results from hours to seconds (5000x) at production
scale
• Leverage Streaming Analytics
• Move analytics to more powerful, high performance open source technologies
– Kafka for queuing
– Storm for analytics
• Improve analytics
– Server reputation
– URL reputation
• Moving forward
– Leverage Analytics Pipeline
– NoSQL DBs
– Graph DBs
HDFSMonitors
12. Hadoop Summit 2015
Ambari Analytics Administration
12CPE CloudFire Analytics
Extensible administration platform for automated
deployment, monitoring and alerting, configuration
management, rolling upgrades, and rolling restarts.
13. Hadoop Summit 2015
Ambari + ElasticSearch – Ambari Stack extensibility
Symantec CloudFire Analytics 13
Ambari custom
service for
ElasticSearch, a new
service to enable
Ambari to manage.
Ambari deploy,
start/stop, config,
monitor
Uses the new
Ambari Views,
integrated via
iFrame
Elastic HQ admin UI
as our use case.
Keeps the
dashboard clean,
yet extensible.
On github to share…
16. Hadoop Summit 2015
Ambari Metrics
Rapid Metrics Dashboards Construction - Reliability
Symantec CloudFire Analytics 16
Prototype for
building out
flexible
dashboards of
most
interesting
application
and platform
metrics.
Alerting
available via
the LMM
service.
21. Hadoop Summit 2015
Self-Service Analytics and Dynamic Clusters
CloudBreak integration and Development In Progress
Symantec CloudFire Analytics 21
22. Hadoop Summit 2015
Self-Service Analytics and Dynamic Clusters
CloudBreak integration and Development In Progress
Symantec CloudFire Analytics 22
Purpose
● Create and connect Analytics Clusters:
● Authoritative Data Lake, Regional Hubs
● Development, Test, Integration, Staging, and Production
● Ability for CloudFire users to spin up new clusters to develop their applications.
Feature
● One-click spin up of Analytics clusters using OpenStack VMs
● VMs that developers spin up are part of their own OpenStack quota
● Developers will have admin access to the cluster they spin up, so they can
install more services if needed.
Automation
● We control the version of the software by using the same automation code
that we use for deploying our production clusters.