SlideShare una empresa de Scribd logo
1 de 40
Ken Owens
CTO Cisco Intercloud Services
07/15/15
How Cisco Migrated from
MapReduce Jobs to Spark
Jobs
1
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Introduction
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Source: IDC 7
30M
New devices
connected
every week
78%
Workloads
processed
in Cloud DCs
by 2018
5TB+
of data per person
by 2020
180B
Mobile apps
downloaded
in 2015
277X
Data created
by IoE devices
v. end-user
The Uber Trend: Exponential Rise in Connectivity
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Exponential Trend
Linear Trend
Disruptive Stress
/Opportunity
Knee of Curve
Exponential Growth Drives Opportunities
Peter Diamandis: BOLD
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
When Products Become Cloud-enabled, They Become
10X More Valuable
$23.19
$249.00
$18.01
$199.00
$5.99
$59.99
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
SaaS
PaaS IaaS
A Broader Perspective than Hybrid Cloud Is Required…
Data Center Cloud Edge / IoT
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Hyperscale applications serving several
thousands of users very quickly
Traditional enterprise applications
IoE and increasing connectivity driving the need
for such workloads
Hadoop, Mobile back-ends, Gaming, Social
Small (~10%), yet rapidly growing
percentage of applications in the Cloud
ERP, CRM, Applications that leverage
traditional databases
Majority of applications being run
for/by Enterprises today
CIOs Need to Embrace Both Traditional
and Hyperscale Application Deployment
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
SaaS
PaaS IaaS
Application Portability and Interoperability Is the Key
Traditional
Applications
ERP, Financial, Client/Server,
CRM, email, …
Cloud Native
Applications
IoT, BigData,Analytics,
Gaming, ...
Data Center Cloud Edge / IoT
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Source: Gartner, Lydia Leong
of CIOs currently
have a second
fast/agile mode
of operation
45%
Traditional
Mode
Requires
Reliability
(ITIL, CMMI, COBIT)
Nonlinear Mode
Accept Instability
(DevOps,
automation,
reusable)
Systems
of
Differentiation
Systems
of
Innovation
Systems
of
Record
Change
Governance
Bimodal IT Is the New Normal
Source: Gartner, Lydia Leong
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Intercloud
The
Intercloud
Web-scale Architecture
API-Driven Automation
Open, Secure, Compliant,
Hybrid IT
Internet
The
Internet
IP Based
Open Standards
World of Isolated Clouds
(2000s)
Individual custom-built clouds
without consistent APIs
Connected for application
acceleration with Open APIs
The Intercloud
Intercloud
Islands of Isolated
PC LAN Networks (1990s)
Multiple LANs using
a multitude of protocols
The Internet
Connected using industry-
standard IP protocol
We Must Connect the Clouds
Use Case: Customer
Interaction Analytics
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Omni-Channel Customer Journeys
Server
Logs
Social
& Chat
Mobile
Event
Streams
Call
Center
S/W
Download
Open Trouble
Ticket
Assign
Engineer
Update
Trouble Ticket
Close Trouble
Ticket
Resolve
Trouble Ticket
Read Support
Documents
View Design
Documents
View Tech
Documents
New
Registration
Bug Search FAQs
Contract
Details
Product
Details
Device
Coverage
Interaction Touch points
Channels
Journey
Case Resolution
Software Upgrade
The customers’ interaction with Cisco across multiple touch points to get the desired business
outcome.
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
• Software Upgrades
• Bug Inquiry
• Software Inquiry
• Trouble Ticket Lifecycle
• Device Troubleshooting
• New Registration
• Contract Renewal
• Customer Interest
Analytics
• Customer Experience
Analytics
• Resource Forecasting
• Security and
Compliance
Customer Journeys Behavioral Insights
• Boost Self Service
• Real-time Content
Optimization &
Recommendation
• Context Based
Predictive Alerts
• Implicit Personalization
Impact
Customer Interaction Analytics
From Journey to Outcome…
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Server Logs
Customer Interaction Analytics
Big Data Platform
Synthesize customer journey maps into behavioral insights.
Call Center
Mobility
Social
Event
Streams
Data
Sources
Data
Ingestion
CiscoDV
Kafka
Redis
ETL
Analytics
Model
Build Model
Activity
Refinement
Activity
Synthesis
Synthesized
Insights
Real-time Processing
Batch Analytics
Insight Services
CiscoDV
Interact
ImpalaHive
Pig ES
Zoomdata,Platfora
AWS and CIS Intercloud
Solution
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
AWS Platform
Component Cloud::
Hadoop
(Batch
Analytics)
Cloud::
Queries
(Interactive
Queries)
Cloud::
Streams
(Near Real-
time
Analytics)
Virtual
Machines
30 6 5
AWS
Instance
Sizing
m3.2xlarge c3.xlarge m3.xlarge
Virtual
Cores
8/VM 4/VM 4/VM
RAM 30GB/VM 7.5GB/VM 15GB/VM
Disk 1.5 TB/VM 1.5 TB/VM 1.5 TB/VM
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Case for Cisco Intercloud Services for Analytics…
 Cisco Security and Compliance requirements
• Workloads that deal with personally identifiable data and Cisco
confidential content cannot be uploaded to AWS. Cisco internal cloud
solution is a better fit.
 Customer journey beyond the enterprise
• Applications are hosted on AWS
• Partner systems hosted on AWS and other cloud providers
Presence in AWS and other cloud services required to support these
scenarios for end-end customer journey insights.
 Data virtualization integrated in the CIS Analytics Stack
• Connect data from multiple clouds and multiple big data platforms
 Integrated visualization toolset
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CIS Analytics Platform
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CIS Analytics Platform Requirements
Infra Provisioning
Deploy a virtual private cloud (VPC) on CIS with compute, storage and memory requirements comparable to the current
production system.
OpenStack
Icehouse OpenStack with Neutron, Nova, and Swift installed.
Big Data Ecosystem
Cloudera’s Hadoop distribution version CDH 5.1.3., ELK Stack, Apache Kafka and Apache Storm.
Data virtualization & Cloud Integration
Access to data services and data stores via Cisco Data Virtualization
Runtime Services
Foundational PaaS capabilities including SLAs for uptime, performance, latency, data retention, issue escalation and
support priorities, issue resolution, problem management, deployment process, patch management.
API Services
Provide both fine-grained and coarse-grained access to the all service layers of the CIS Analytics Platform. In the hybrid cloud
model it must support interoperability across platform service providers and promote the cloud concepts of extensibility and
flexibility.
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
AWS to CIS Migration – Success Criteria
 Successful synthesis of customer interaction data
 Successful automation of the end-end data process pipeline
 Build behavioral insight services
 Access to data and services via data discovery and visualization tools
 Meet the performance, scale and platform stability requirements
 Successful deployment of CiscoDV on CIS
 Connect HDFS and Hive DS with CiscoDV via Hive and Impala
 Build and expose insight services for consumption by limited users
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
AWS and CIS Data Node Sizing Comparison
Hadoop Cluster for Batch and Query Analytics
Node Service AWS Instance Type vCPU Mem Storage
Number of
Data Nodes
Comments
Data Nodes/
Node Master m3.2xlarge 8 30 2x80 GB 30
Each hadoop data node has 1500GB of EBS
available for HDFS storage
AWS Sizing
CCS Sizing
Node Service CCS Instance Type vCPU Mem Storage
Number of
Data Nodes
Comments
Data Nodes/
Node Master GP-2XLarge 8 32 50 35
Each hadoop data node has 1500GB of EBS
available for HDFS storage
Less than AWS sizing (Storage)
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Pilot Test Data
• Test performed on one day’s production data
• Total no. of records processed – 110,852,667
• Total data size – 32GB
• Total no. of M/R jobs in the data pipeline – 17
• Two test cycles
• Cycle 1: Heterogeneous CCS nodes (vCPUs, storage, memory)
• Cycle 2: Homogeneous CCS nodes
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CIS Performance of Batch Analytics – Limited Test
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Test Details by M/R job
Job Name CCS 12
nodes:
cycle1
CCS 18
nodes:
cycle1
CCS 24
nodes:
cycle1
CCS 30
nodes:
cycle1
CCS 18
nodes:
cycle2
CCS 24
nodes:
cycle2
CCS 30
nodes:
cycle2
CCS 35
nodes:
cycle2
New_cleanse 249 176 143 117 82 67 55 51
Process_private_ip 27 14 11 10 7 5 6 6
join_web_and_ip_data 142 95 76 61 49 40 34 29
combine_ip_decorated_files 26 14 11 10 9 7 8 7
filterBotEntries 34 19 15 13 10 8 7 7
sessionize 71 64 69 62 60 63 15 13
firstActivitiesFilter 26 15 13 10 9 8 6 6
allOtherActivitiesFilter 29 18 13 13 11 9 7 6
matchFirstActivities 21 13 11 13 13 11 8 8
buildActivities 27 15 12 10 7 6 9 9
filterBUG 8 5 3 2 3 3 4 4
filterSEA 8 5 3 2 3 3 4 4
filterTCO 8 5 3 2 3 3 4 4
filterTDV 8 5 3 2 3 3 4 4
filterWDV 8 5 3 2 3 3 4 4
filterMOD 8 5 3 2 3 3 4 4
filterTOOL 8 5 3 2 3 3 4 4
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
PoC: Analytics with Spark on CIS
Existing code
 Made in Ruby with Wukong to run on Hadoop
 A history of changes and modifications
 Script-based, steps communicate via intermediary files
Goal
 Revise, rethink and reimplement with Spark on CIS
 Open for advanced cloud analytics
 Improve maintainability by moving away from aging Ruby on Hadoop
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Sessionize
Cleanse
logs
cleanse
private web
decorate
sessionize
(cookie, time)
sessioned
match 1st
(IP, UA, time)
build actions merge
session PSV
add to hivebug tool
first, others, bots
1..7
onlyBots
first
others
private
Main
computation
happens here
cleansed
 Pre-process log records (‘cleanse’)
 Extract HTTP sessions (‘sessionize’)
 Extract user actions, such as ‘search’, ‘download
patch’, ‘open manual’, ‘open a bug’
Ruby: Scripts with temp files
 Each box on the figure is a script in a separate file
 They pipe Gb of data as input and output
 Random matching of nodes to data for sessionizing
 Lots of redundant shuffling
Ruby Flow
global sort in time
global group by IP
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Sessionize
Cleanse
logs
cleanse
private web
decorate
sessionize
(cookie, time)
sessioned
match 1st
(IP, UA, time)
build actions merge
session PSV
add to hivebug tool
first, others, bots
1..7
onlyBots
first
others
private
Main
computation
happens here
cleansed
 Same flow, but each box is a Java or Scala function
No intermediate temp files
 Steps are chained by Spark, often without any need for
intermediate data
 If still needed, the data is stored in memory and local
disk as much as possible
Local computation
 Cleansing is computed on nodes local to data blocks
(same as Ruby)
 Sessions are built per IP
 On separate nodes each handling a single IP range
 One copied to the node on partition the data remains
local
Spark Flow
global partition by IP
local sort in time
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
 Volumes
 Logs of a single day: 52 Gb
 Total of 110 mil records
 Where 53 mil records are kept after pre-filtering
 Producing over 1 mil user actions
 Cluster of 30 nodes
 Ruby
 Runtime 140 min
 Spark
 Runtime 7 min (20 times faster )
Runtime comparison
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
 Extracting sessions means sort in time and group by IP
 Ruby:
 sorting in time and per-IP grouping is performed across the whole cluster (very bad, lots of IO)
 Spark is good at dealing with partitions:
 per-IP groups are placed on different machines (partitions)
 global sort in time is replaced by many local per-IP sorts done on machines responsible for
extracting sessions for specific groups of IP addressed
 Other improvements
 Avoid redundant temp files, redundant (de)-serialization of objects (comes with Java/Scala), stages
keep data in memory when possible (comes with Spark)
 Cache results of user agent resolution that are heavy on regular expressions
Why?
CiscoDV on CIS
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Data Virtualization for Intercloud Analytics
Customer Benefits
 Discover data beyond the enterprise: Virtual integration that combines traditional
enterprise data, Big Data stores on CIS and AWS, cloud data from SaaS providers and,
Cisco Customers and Partners
 Seamless interoperability offers easy access to data across distributed data sources
in the intercloud analytics platform
 Universal data governance maximizes enforcement of data security rules
 Analytics Data Hubs: Deployment flexibility to build hybrid/virtual sandboxes that
enable nimble data discovery and rapid data analytics to support multiple LOBs
 Deliver data to any number of analytics tools.
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Use Case 1: Get Case Interactions
Use Case Description # of cases opened by company X that
are currently open. (other variations
would include cases by company,
trends etc.)
CiscoDV Value CiscoDV enforces data security rules to
restrict access on the intercloud
platform to customer sensitive data.
Data Sources SalesForce
Intercloud Solution CIS CiscoDV service can access the
“sanitized” version of CSOne data
through JDBC from RIDES(SWTG
CiscoDV) API.
Connection Type DV on hybrid cloud  Enterprise data
store
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Use Case 2: Get Customer Journey
Use Case Description Customer interactions on the web
pertaining to bug search and case
submission process. Foundational data
can be used to explore trends and feed
into content recommendation models
CiscoDV Value Direct access to Data on CIS Intercloud Analytics
Platform
Data Sources SAS Analytics
Intercloud Solution By direct network access to the Impala
Server, the CIS CiscoDV server
connects to the Impala Service in
Hadoop also on CIS as a Data Source.
SQL Queries configured in CiscoDV
execute Impala queries
Connection Type DV on hybrid cloud  VPC Big Data
platform
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
Use Case 3: Get Bug Interactions
Use Case
Description
Another foundational data service that provides
a breakdown of customer exposure or interest
in bugs. The service can be refined further to
look at trends specific to a company or a
product for further analytics.
CiscoDV Value Real-time data federation that accesses
extremely large data in CIS Intercloud Analytics
platform and join that with Bug Data accessed
via departmental CiscoDV instance (RIDES)
Data Sources SASA Analytics and QDDTS via RIDES
Intercloud
Solution
By building on the access to the Impala Server,
the DV server can join the Bug Data from the
Enterprise Data Stores with the HDFS data to
provide a federated view.
Connection
Type
DV on hybrid cloud  VPC Big Data platform
and Enterprise data store
Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public
CiscoDV on Intercloud Analytics Platform (CIS)
Scenario 1
CIS Cisco DV to Cisco
Enterprise Data Store
Scenario 2
CIS CiscoDV to Impala and
Hive on CIS Intercloud
Analytics Platform
Scenario 3
CIS Cisco DV to Hive on AWS
Big Data Cluster
Scenario1
Scenario 3
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015

Más contenido relacionado

La actualidad más candente

Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 Andrey Vykhodtsev
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Codemotion
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...DataWorks Summit
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 

La actualidad más candente (20)

Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Apache Deep Learning 201
Apache Deep Learning 201Apache Deep Learning 201
Apache Deep Learning 201
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it's Cool, but are You Doing it...
 
Novinky v Oracle Database 18c
Novinky v Oracle Database 18cNovinky v Oracle Database 18c
Novinky v Oracle Database 18c
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 

Destacado

Data Visualization on the Tech Side
Data Visualization on the Tech SideData Visualization on the Tech Side
Data Visualization on the Tech SideMathieu Elie
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous PersistenceJervin Real
 
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal Habitats
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal HabitatsUSGS Report on the Impact of Marcellus Shale Drilling on Forest Animal Habitats
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal HabitatsMarcellus Drilling News
 
Bsides Delhi Security Automation for Red and Blue Teams
Bsides Delhi Security Automation for Red and Blue TeamsBsides Delhi Security Automation for Red and Blue Teams
Bsides Delhi Security Automation for Red and Blue TeamsSuraj Pratap
 
Demystifying Security Analytics: Data, Methods, Use Cases
Demystifying Security Analytics: Data, Methods, Use CasesDemystifying Security Analytics: Data, Methods, Use Cases
Demystifying Security Analytics: Data, Methods, Use CasesPriyanka Aash
 
Java management extensions (jmx)
Java management extensions (jmx)Java management extensions (jmx)
Java management extensions (jmx)Tarun Telang
 
Mindmappen
MindmappenMindmappen
Mindmappenyperlaan
 
Accelerated Leadership
Accelerated LeadershipAccelerated Leadership
Accelerated Leadershipkktv
 
Opensource approach to design and deployment of Microservices based VNF
Opensource approach to design and deployment of Microservices based VNFOpensource approach to design and deployment of Microservices based VNF
Opensource approach to design and deployment of Microservices based VNFMichelle Holley
 
Performance testing for web-scale
Performance testing for web-scalePerformance testing for web-scale
Performance testing for web-scaleIzzet Mustafaiev
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBigData_Europe
 
Docker experience @inbotapp
Docker experience @inbotappDocker experience @inbotapp
Docker experience @inbotappJilles van Gurp
 
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsSocialmetrix
 
How Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App ModernizationHow Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App ModernizationDocker, Inc.
 
SocCnx11 - All you need to know about orient me
SocCnx11 - All you need to know about orient meSocCnx11 - All you need to know about orient me
SocCnx11 - All you need to know about orient mepanagenda
 

Destacado (20)

IOT Exploitation
IOT Exploitation	IOT Exploitation
IOT Exploitation
 
114 Numalliance
114 Numalliance114 Numalliance
114 Numalliance
 
Data Visualization on the Tech Side
Data Visualization on the Tech SideData Visualization on the Tech Side
Data Visualization on the Tech Side
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
 
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal Habitats
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal HabitatsUSGS Report on the Impact of Marcellus Shale Drilling on Forest Animal Habitats
USGS Report on the Impact of Marcellus Shale Drilling on Forest Animal Habitats
 
Bsides Delhi Security Automation for Red and Blue Teams
Bsides Delhi Security Automation for Red and Blue TeamsBsides Delhi Security Automation for Red and Blue Teams
Bsides Delhi Security Automation for Red and Blue Teams
 
Demystifying Security Analytics: Data, Methods, Use Cases
Demystifying Security Analytics: Data, Methods, Use CasesDemystifying Security Analytics: Data, Methods, Use Cases
Demystifying Security Analytics: Data, Methods, Use Cases
 
Java management extensions (jmx)
Java management extensions (jmx)Java management extensions (jmx)
Java management extensions (jmx)
 
Mindmappen
MindmappenMindmappen
Mindmappen
 
EVOLVE'16 | Enhance | Anil Kalbag & Anshul Chhabra | Comparative Architecture...
EVOLVE'16 | Enhance | Anil Kalbag & Anshul Chhabra | Comparative Architecture...EVOLVE'16 | Enhance | Anil Kalbag & Anshul Chhabra | Comparative Architecture...
EVOLVE'16 | Enhance | Anil Kalbag & Anshul Chhabra | Comparative Architecture...
 
Accelerated Leadership
Accelerated LeadershipAccelerated Leadership
Accelerated Leadership
 
Opensource approach to design and deployment of Microservices based VNF
Opensource approach to design and deployment of Microservices based VNFOpensource approach to design and deployment of Microservices based VNF
Opensource approach to design and deployment of Microservices based VNF
 
Performance testing for web-scale
Performance testing for web-scalePerformance testing for web-scale
Performance testing for web-scale
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
 
Docker experience @inbotapp
Docker experience @inbotappDocker experience @inbotapp
Docker experience @inbotapp
 
DevOps Offerings at WhiteHedge
DevOps Offerings at WhiteHedgeDevOps Offerings at WhiteHedge
DevOps Offerings at WhiteHedge
 
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time AnalyticsAWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics
 
Incident Response in the wake of Dear CEO
Incident Response in the wake of Dear CEOIncident Response in the wake of Dear CEO
Incident Response in the wake of Dear CEO
 
How Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App ModernizationHow Docker EE is Finnish Railway’s Ticket to App Modernization
How Docker EE is Finnish Railway’s Ticket to App Modernization
 
SocCnx11 - All you need to know about orient me
SocCnx11 - All you need to know about orient meSocCnx11 - All you need to know about orient me
SocCnx11 - All you need to know about orient me
 

Similar a How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015

Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...ldangelo0772
 
L'azienda è più agile? Tutto merito del Data Center
L'azienda è più agile? Tutto merito del Data Center L'azienda è più agile? Tutto merito del Data Center
L'azienda è più agile? Tutto merito del Data Center SMAU
 
Building The Right Network
Building The Right NetworkBuilding The Right Network
Building The Right NetworkCisco Canada
 
Cisco Connect Halifax 2018 Cisco dna - deeper dive
Cisco Connect Halifax 2018   Cisco dna - deeper diveCisco Connect Halifax 2018   Cisco dna - deeper dive
Cisco Connect Halifax 2018 Cisco dna - deeper diveCisco Canada
 
Application Centric Infrastructure (ACI), the policy driven data centre
Application Centric Infrastructure (ACI), the policy driven data centreApplication Centric Infrastructure (ACI), the policy driven data centre
Application Centric Infrastructure (ACI), the policy driven data centreCisco Canada
 
Presentation data center transformation cisco’s virtualization and cloud jo...
Presentation   data center transformation cisco’s virtualization and cloud jo...Presentation   data center transformation cisco’s virtualization and cloud jo...
Presentation data center transformation cisco’s virtualization and cloud jo...xKinAnx
 
Cisco’s Cloud Ready Infrastructure
Cisco’s Cloud Ready InfrastructureCisco’s Cloud Ready Infrastructure
Cisco’s Cloud Ready InfrastructureCisco Canada
 
Migrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on AzureMigrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on AzureMitchell Pronschinske
 
Cisco Connect 2018 Indonesia - software-defined access-a transformational ap...
Cisco Connect 2018 Indonesia -  software-defined access-a transformational ap...Cisco Connect 2018 Indonesia -  software-defined access-a transformational ap...
Cisco Connect 2018 Indonesia - software-defined access-a transformational ap...NetworkCollaborators
 
Presentation capturing the cloud opportunity
Presentation   capturing the cloud opportunityPresentation   capturing the cloud opportunity
Presentation capturing the cloud opportunityxKinAnx
 
Cisco Connect Toronto 2018 sd-wan - delivering intent-based networking to t...
Cisco Connect Toronto 2018   sd-wan - delivering intent-based networking to t...Cisco Connect Toronto 2018   sd-wan - delivering intent-based networking to t...
Cisco Connect Toronto 2018 sd-wan - delivering intent-based networking to t...Cisco Canada
 
Cisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUI
Cisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUICisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUI
Cisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUICisco Canada
 
Cisco Digital Network Architecture Deeper Dive From The Gates To The Gui
Cisco Digital Network Architecture Deeper Dive From The Gates To The GuiCisco Digital Network Architecture Deeper Dive From The Gates To The Gui
Cisco Digital Network Architecture Deeper Dive From The Gates To The GuiCisco Canada
 
Cisco ucs overview ibm team 2014 v.2 - handout
Cisco ucs overview   ibm team 2014 v.2 - handoutCisco ucs overview   ibm team 2014 v.2 - handout
Cisco ucs overview ibm team 2014 v.2 - handoutSarmad Ibrahim
 
Cisco Connect 2018 Singapore - Cisco Software Defined Access
Cisco Connect 2018 Singapore - Cisco Software Defined AccessCisco Connect 2018 Singapore - Cisco Software Defined Access
Cisco Connect 2018 Singapore - Cisco Software Defined AccessNetworkCollaborators
 
Cisco’s Cloud Strategy, including our acquisition of CliQr
Cisco’s Cloud Strategy, including our acquisition of CliQr Cisco’s Cloud Strategy, including our acquisition of CliQr
Cisco’s Cloud Strategy, including our acquisition of CliQr Cisco Canada
 
Cisco Powered Presentation - For Customers
Cisco Powered Presentation - For CustomersCisco Powered Presentation - For Customers
Cisco Powered Presentation - For CustomersCisco Powered
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsPowering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsHitachi Vantara
 
Cisco Connect Toronto 2017 - Introducing the Network Intuitive
Cisco Connect Toronto 2017 - Introducing the Network IntuitiveCisco Connect Toronto 2017 - Introducing the Network Intuitive
Cisco Connect Toronto 2017 - Introducing the Network IntuitiveCisco Canada
 

Similar a How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015 (20)

Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
 
L'azienda è più agile? Tutto merito del Data Center
L'azienda è più agile? Tutto merito del Data Center L'azienda è più agile? Tutto merito del Data Center
L'azienda è più agile? Tutto merito del Data Center
 
Building The Right Network
Building The Right NetworkBuilding The Right Network
Building The Right Network
 
Cisco Connect Halifax 2018 Cisco dna - deeper dive
Cisco Connect Halifax 2018   Cisco dna - deeper diveCisco Connect Halifax 2018   Cisco dna - deeper dive
Cisco Connect Halifax 2018 Cisco dna - deeper dive
 
Application Centric Infrastructure (ACI), the policy driven data centre
Application Centric Infrastructure (ACI), the policy driven data centreApplication Centric Infrastructure (ACI), the policy driven data centre
Application Centric Infrastructure (ACI), the policy driven data centre
 
Presentation data center transformation cisco’s virtualization and cloud jo...
Presentation   data center transformation cisco’s virtualization and cloud jo...Presentation   data center transformation cisco’s virtualization and cloud jo...
Presentation data center transformation cisco’s virtualization and cloud jo...
 
Cisco’s Cloud Ready Infrastructure
Cisco’s Cloud Ready InfrastructureCisco’s Cloud Ready Infrastructure
Cisco’s Cloud Ready Infrastructure
 
Migrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on AzureMigrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
Migrating from VMs to Kubernetes using HashiCorp Consul Service on Azure
 
Cisco Connect 2018 Indonesia - software-defined access-a transformational ap...
Cisco Connect 2018 Indonesia -  software-defined access-a transformational ap...Cisco Connect 2018 Indonesia -  software-defined access-a transformational ap...
Cisco Connect 2018 Indonesia - software-defined access-a transformational ap...
 
Cisco data center training for ibm
Cisco data center training for ibmCisco data center training for ibm
Cisco data center training for ibm
 
Presentation capturing the cloud opportunity
Presentation   capturing the cloud opportunityPresentation   capturing the cloud opportunity
Presentation capturing the cloud opportunity
 
Cisco Connect Toronto 2018 sd-wan - delivering intent-based networking to t...
Cisco Connect Toronto 2018   sd-wan - delivering intent-based networking to t...Cisco Connect Toronto 2018   sd-wan - delivering intent-based networking to t...
Cisco Connect Toronto 2018 sd-wan - delivering intent-based networking to t...
 
Cisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUI
Cisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUICisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUI
Cisco Digital Network Architecture – Deeper Dive, “From the Gates to the GUI
 
Cisco Digital Network Architecture Deeper Dive From The Gates To The Gui
Cisco Digital Network Architecture Deeper Dive From The Gates To The GuiCisco Digital Network Architecture Deeper Dive From The Gates To The Gui
Cisco Digital Network Architecture Deeper Dive From The Gates To The Gui
 
Cisco ucs overview ibm team 2014 v.2 - handout
Cisco ucs overview   ibm team 2014 v.2 - handoutCisco ucs overview   ibm team 2014 v.2 - handout
Cisco ucs overview ibm team 2014 v.2 - handout
 
Cisco Connect 2018 Singapore - Cisco Software Defined Access
Cisco Connect 2018 Singapore - Cisco Software Defined AccessCisco Connect 2018 Singapore - Cisco Software Defined Access
Cisco Connect 2018 Singapore - Cisco Software Defined Access
 
Cisco’s Cloud Strategy, including our acquisition of CliQr
Cisco’s Cloud Strategy, including our acquisition of CliQr Cisco’s Cloud Strategy, including our acquisition of CliQr
Cisco’s Cloud Strategy, including our acquisition of CliQr
 
Cisco Powered Presentation - For Customers
Cisco Powered Presentation - For CustomersCisco Powered Presentation - For Customers
Cisco Powered Presentation - For Customers
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsPowering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
 
Cisco Connect Toronto 2017 - Introducing the Network Intuitive
Cisco Connect Toronto 2017 - Introducing the Network IntuitiveCisco Connect Toronto 2017 - Introducing the Network Intuitive
Cisco Connect Toronto 2017 - Introducing the Network Intuitive
 

Más de StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 

Más de StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Último

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Último (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015

  • 1. Ken Owens CTO Cisco Intercloud Services 07/15/15 How Cisco Migrated from MapReduce Jobs to Spark Jobs 1
  • 2. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Introduction
  • 3. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Introduction
  • 4. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Introduction
  • 5. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Introduction
  • 6. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Introduction
  • 7. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Source: IDC 7 30M New devices connected every week 78% Workloads processed in Cloud DCs by 2018 5TB+ of data per person by 2020 180B Mobile apps downloaded in 2015 277X Data created by IoE devices v. end-user The Uber Trend: Exponential Rise in Connectivity
  • 8. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Exponential Trend Linear Trend Disruptive Stress /Opportunity Knee of Curve Exponential Growth Drives Opportunities Peter Diamandis: BOLD
  • 9. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public When Products Become Cloud-enabled, They Become 10X More Valuable $23.19 $249.00 $18.01 $199.00 $5.99 $59.99
  • 10. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public SaaS PaaS IaaS A Broader Perspective than Hybrid Cloud Is Required… Data Center Cloud Edge / IoT
  • 11. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Hyperscale applications serving several thousands of users very quickly Traditional enterprise applications IoE and increasing connectivity driving the need for such workloads Hadoop, Mobile back-ends, Gaming, Social Small (~10%), yet rapidly growing percentage of applications in the Cloud ERP, CRM, Applications that leverage traditional databases Majority of applications being run for/by Enterprises today CIOs Need to Embrace Both Traditional and Hyperscale Application Deployment
  • 12. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public SaaS PaaS IaaS Application Portability and Interoperability Is the Key Traditional Applications ERP, Financial, Client/Server, CRM, email, … Cloud Native Applications IoT, BigData,Analytics, Gaming, ... Data Center Cloud Edge / IoT
  • 13. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Source: Gartner, Lydia Leong of CIOs currently have a second fast/agile mode of operation 45% Traditional Mode Requires Reliability (ITIL, CMMI, COBIT) Nonlinear Mode Accept Instability (DevOps, automation, reusable) Systems of Differentiation Systems of Innovation Systems of Record Change Governance Bimodal IT Is the New Normal Source: Gartner, Lydia Leong
  • 14. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Intercloud The Intercloud Web-scale Architecture API-Driven Automation Open, Secure, Compliant, Hybrid IT Internet The Internet IP Based Open Standards World of Isolated Clouds (2000s) Individual custom-built clouds without consistent APIs Connected for application acceleration with Open APIs The Intercloud Intercloud Islands of Isolated PC LAN Networks (1990s) Multiple LANs using a multitude of protocols The Internet Connected using industry- standard IP protocol We Must Connect the Clouds
  • 16. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Omni-Channel Customer Journeys Server Logs Social & Chat Mobile Event Streams Call Center S/W Download Open Trouble Ticket Assign Engineer Update Trouble Ticket Close Trouble Ticket Resolve Trouble Ticket Read Support Documents View Design Documents View Tech Documents New Registration Bug Search FAQs Contract Details Product Details Device Coverage Interaction Touch points Channels Journey Case Resolution Software Upgrade The customers’ interaction with Cisco across multiple touch points to get the desired business outcome.
  • 17. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public • Software Upgrades • Bug Inquiry • Software Inquiry • Trouble Ticket Lifecycle • Device Troubleshooting • New Registration • Contract Renewal • Customer Interest Analytics • Customer Experience Analytics • Resource Forecasting • Security and Compliance Customer Journeys Behavioral Insights • Boost Self Service • Real-time Content Optimization & Recommendation • Context Based Predictive Alerts • Implicit Personalization Impact Customer Interaction Analytics From Journey to Outcome…
  • 18. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Server Logs Customer Interaction Analytics Big Data Platform Synthesize customer journey maps into behavioral insights. Call Center Mobility Social Event Streams Data Sources Data Ingestion CiscoDV Kafka Redis ETL Analytics Model Build Model Activity Refinement Activity Synthesis Synthesized Insights Real-time Processing Batch Analytics Insight Services CiscoDV Interact ImpalaHive Pig ES Zoomdata,Platfora
  • 19. AWS and CIS Intercloud Solution
  • 20. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public AWS Platform Component Cloud:: Hadoop (Batch Analytics) Cloud:: Queries (Interactive Queries) Cloud:: Streams (Near Real- time Analytics) Virtual Machines 30 6 5 AWS Instance Sizing m3.2xlarge c3.xlarge m3.xlarge Virtual Cores 8/VM 4/VM 4/VM RAM 30GB/VM 7.5GB/VM 15GB/VM Disk 1.5 TB/VM 1.5 TB/VM 1.5 TB/VM
  • 21. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Case for Cisco Intercloud Services for Analytics…  Cisco Security and Compliance requirements • Workloads that deal with personally identifiable data and Cisco confidential content cannot be uploaded to AWS. Cisco internal cloud solution is a better fit.  Customer journey beyond the enterprise • Applications are hosted on AWS • Partner systems hosted on AWS and other cloud providers Presence in AWS and other cloud services required to support these scenarios for end-end customer journey insights.  Data virtualization integrated in the CIS Analytics Stack • Connect data from multiple clouds and multiple big data platforms  Integrated visualization toolset
  • 22. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public CIS Analytics Platform
  • 23. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public CIS Analytics Platform Requirements Infra Provisioning Deploy a virtual private cloud (VPC) on CIS with compute, storage and memory requirements comparable to the current production system. OpenStack Icehouse OpenStack with Neutron, Nova, and Swift installed. Big Data Ecosystem Cloudera’s Hadoop distribution version CDH 5.1.3., ELK Stack, Apache Kafka and Apache Storm. Data virtualization & Cloud Integration Access to data services and data stores via Cisco Data Virtualization Runtime Services Foundational PaaS capabilities including SLAs for uptime, performance, latency, data retention, issue escalation and support priorities, issue resolution, problem management, deployment process, patch management. API Services Provide both fine-grained and coarse-grained access to the all service layers of the CIS Analytics Platform. In the hybrid cloud model it must support interoperability across platform service providers and promote the cloud concepts of extensibility and flexibility.
  • 24. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public AWS to CIS Migration – Success Criteria  Successful synthesis of customer interaction data  Successful automation of the end-end data process pipeline  Build behavioral insight services  Access to data and services via data discovery and visualization tools  Meet the performance, scale and platform stability requirements  Successful deployment of CiscoDV on CIS  Connect HDFS and Hive DS with CiscoDV via Hive and Impala  Build and expose insight services for consumption by limited users
  • 25. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public AWS and CIS Data Node Sizing Comparison Hadoop Cluster for Batch and Query Analytics Node Service AWS Instance Type vCPU Mem Storage Number of Data Nodes Comments Data Nodes/ Node Master m3.2xlarge 8 30 2x80 GB 30 Each hadoop data node has 1500GB of EBS available for HDFS storage AWS Sizing CCS Sizing Node Service CCS Instance Type vCPU Mem Storage Number of Data Nodes Comments Data Nodes/ Node Master GP-2XLarge 8 32 50 35 Each hadoop data node has 1500GB of EBS available for HDFS storage Less than AWS sizing (Storage)
  • 26. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Pilot Test Data • Test performed on one day’s production data • Total no. of records processed – 110,852,667 • Total data size – 32GB • Total no. of M/R jobs in the data pipeline – 17 • Two test cycles • Cycle 1: Heterogeneous CCS nodes (vCPUs, storage, memory) • Cycle 2: Homogeneous CCS nodes
  • 27. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public CIS Performance of Batch Analytics – Limited Test
  • 28. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Test Details by M/R job Job Name CCS 12 nodes: cycle1 CCS 18 nodes: cycle1 CCS 24 nodes: cycle1 CCS 30 nodes: cycle1 CCS 18 nodes: cycle2 CCS 24 nodes: cycle2 CCS 30 nodes: cycle2 CCS 35 nodes: cycle2 New_cleanse 249 176 143 117 82 67 55 51 Process_private_ip 27 14 11 10 7 5 6 6 join_web_and_ip_data 142 95 76 61 49 40 34 29 combine_ip_decorated_files 26 14 11 10 9 7 8 7 filterBotEntries 34 19 15 13 10 8 7 7 sessionize 71 64 69 62 60 63 15 13 firstActivitiesFilter 26 15 13 10 9 8 6 6 allOtherActivitiesFilter 29 18 13 13 11 9 7 6 matchFirstActivities 21 13 11 13 13 11 8 8 buildActivities 27 15 12 10 7 6 9 9 filterBUG 8 5 3 2 3 3 4 4 filterSEA 8 5 3 2 3 3 4 4 filterTCO 8 5 3 2 3 3 4 4 filterTDV 8 5 3 2 3 3 4 4 filterWDV 8 5 3 2 3 3 4 4 filterMOD 8 5 3 2 3 3 4 4 filterTOOL 8 5 3 2 3 3 4 4
  • 29. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public PoC: Analytics with Spark on CIS Existing code  Made in Ruby with Wukong to run on Hadoop  A history of changes and modifications  Script-based, steps communicate via intermediary files Goal  Revise, rethink and reimplement with Spark on CIS  Open for advanced cloud analytics  Improve maintainability by moving away from aging Ruby on Hadoop
  • 30. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Sessionize Cleanse logs cleanse private web decorate sessionize (cookie, time) sessioned match 1st (IP, UA, time) build actions merge session PSV add to hivebug tool first, others, bots 1..7 onlyBots first others private Main computation happens here cleansed  Pre-process log records (‘cleanse’)  Extract HTTP sessions (‘sessionize’)  Extract user actions, such as ‘search’, ‘download patch’, ‘open manual’, ‘open a bug’ Ruby: Scripts with temp files  Each box on the figure is a script in a separate file  They pipe Gb of data as input and output  Random matching of nodes to data for sessionizing  Lots of redundant shuffling Ruby Flow global sort in time global group by IP
  • 31. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Sessionize Cleanse logs cleanse private web decorate sessionize (cookie, time) sessioned match 1st (IP, UA, time) build actions merge session PSV add to hivebug tool first, others, bots 1..7 onlyBots first others private Main computation happens here cleansed  Same flow, but each box is a Java or Scala function No intermediate temp files  Steps are chained by Spark, often without any need for intermediate data  If still needed, the data is stored in memory and local disk as much as possible Local computation  Cleansing is computed on nodes local to data blocks (same as Ruby)  Sessions are built per IP  On separate nodes each handling a single IP range  One copied to the node on partition the data remains local Spark Flow global partition by IP local sort in time
  • 32. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public  Volumes  Logs of a single day: 52 Gb  Total of 110 mil records  Where 53 mil records are kept after pre-filtering  Producing over 1 mil user actions  Cluster of 30 nodes  Ruby  Runtime 140 min  Spark  Runtime 7 min (20 times faster ) Runtime comparison
  • 33. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public  Extracting sessions means sort in time and group by IP  Ruby:  sorting in time and per-IP grouping is performed across the whole cluster (very bad, lots of IO)  Spark is good at dealing with partitions:  per-IP groups are placed on different machines (partitions)  global sort in time is replaced by many local per-IP sorts done on machines responsible for extracting sessions for specific groups of IP addressed  Other improvements  Avoid redundant temp files, redundant (de)-serialization of objects (comes with Java/Scala), stages keep data in memory when possible (comes with Spark)  Cache results of user agent resolution that are heavy on regular expressions Why?
  • 35. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Data Virtualization for Intercloud Analytics Customer Benefits  Discover data beyond the enterprise: Virtual integration that combines traditional enterprise data, Big Data stores on CIS and AWS, cloud data from SaaS providers and, Cisco Customers and Partners  Seamless interoperability offers easy access to data across distributed data sources in the intercloud analytics platform  Universal data governance maximizes enforcement of data security rules  Analytics Data Hubs: Deployment flexibility to build hybrid/virtual sandboxes that enable nimble data discovery and rapid data analytics to support multiple LOBs  Deliver data to any number of analytics tools.
  • 36. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Use Case 1: Get Case Interactions Use Case Description # of cases opened by company X that are currently open. (other variations would include cases by company, trends etc.) CiscoDV Value CiscoDV enforces data security rules to restrict access on the intercloud platform to customer sensitive data. Data Sources SalesForce Intercloud Solution CIS CiscoDV service can access the “sanitized” version of CSOne data through JDBC from RIDES(SWTG CiscoDV) API. Connection Type DV on hybrid cloud  Enterprise data store
  • 37. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Use Case 2: Get Customer Journey Use Case Description Customer interactions on the web pertaining to bug search and case submission process. Foundational data can be used to explore trends and feed into content recommendation models CiscoDV Value Direct access to Data on CIS Intercloud Analytics Platform Data Sources SAS Analytics Intercloud Solution By direct network access to the Impala Server, the CIS CiscoDV server connects to the Impala Service in Hadoop also on CIS as a Data Source. SQL Queries configured in CiscoDV execute Impala queries Connection Type DV on hybrid cloud  VPC Big Data platform
  • 38. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public Use Case 3: Get Bug Interactions Use Case Description Another foundational data service that provides a breakdown of customer exposure or interest in bugs. The service can be refined further to look at trends specific to a company or a product for further analytics. CiscoDV Value Real-time data federation that accesses extremely large data in CIS Intercloud Analytics platform and join that with Bug Data accessed via departmental CiscoDV instance (RIDES) Data Sources SASA Analytics and QDDTS via RIDES Intercloud Solution By building on the access to the Impala Server, the DV server can join the Bug Data from the Enterprise Data Stores with the HDFS data to provide a federated view. Connection Type DV on hybrid cloud  VPC Big Data platform and Enterprise data store
  • 39. Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public CiscoDV on Intercloud Analytics Platform (CIS) Scenario 1 CIS Cisco DV to Cisco Enterprise Data Store Scenario 2 CIS CiscoDV to Impala and Hive on CIS Intercloud Analytics Platform Scenario 3 CIS Cisco DV to Hive on AWS Big Data Cluster Scenario1 Scenario 3

Notas del editor

  1. FABIO – a few items from Pankaj and Liz Monday: Per the John Chambers slides I sent you Monday night, please be sure to fully address digitization in the opener, so Pankaj can connect to John’s opening remarks. Set the stage here for what the digital transformation is and why it dries IoE and cloud. Explain where we came from, where we are today – exponential growth and a magnitude of changes still to come. Please see new VNI, to see if there are any newer/better stats re the Data Center. Pankaj feels the top 3 data points are ok in this slide, but perhaps we could find better ones for the bottom 2 data points? Maybe uplevel them a bit? ------------------------------------------------------- The world is changing. The digital transformation is turning traditional business models on their heads. We are seeing unprecedented growth in the explosion of devices and mobile apps and in data utilization. IoE – IoE devices create 277 times the data that the end user is creating. But only a fraction of it ever reaches the data center. A Boeing 787 for example, generates 40 TB of data per every hour of flight time. But only 0.5 TB is ultimately transmitted to the data center. Mobility: In 2014, global mobile data traffic grew 1.7x or 69%… In 2014 alone, 77B+ mobile apps downloaded… by 2015 180B apps (233% increase) Internet… IDC predicts by 2017, there will be 3.6 billion global Internet users… More than 1/2 the world population Big Data… By 2020 there will be more than 5,000 GB of data for every person on Earth These massive changes are putting tremendous stress on the data center. The traditional data center model has to evolve in order to meet demand today and into the future.
  2. We know how to fix this We’re going to do for cloud what we did for data. You couldn’t move data between the networks – they weren’t connected. Cisco unified those worlds The world of cloud today is a world of isolated clouds. There’s no workload or data portability. “Amazon is hotel California – you can never leave, and that data is staying there” Our vision is to connect all these clouds together into the Intercloud - whether private, public , or hybrid through technology and innovation Intercloud is going to connect these clouds together in the same way we connected data together. No one cloud model or single cloud approach, such as the massively scalable clouds from Amazon, Google or Microsoft will win alone in this space