SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Spark Use Case at 
Telefónica CyberSecurity (CBS) 
Antonio Alcocer 
antonio@stratio.com 
Oscar Mendez 
oscar@stratio.com 
@omendezsoto 
#CassandraSummit 2014 1
Who are we? 
STRATIO 
• Stratio is a Big Data Company 
• Founded in 2013 
• Commercially launched in 2014 
• 50+ employees in Madrid 
• Office in San Francisco 
• Certified Spark distribution 
#CassandraSummit 2014 2
3
General info 
o 1924- 2014: 317+ customer with 130.000+ employees 
o 2nd European operator by revenues 
o 4th global integrated operator by accesses 
o 9th Telco in the Global ranking by market capitalization 
o 2nd global operator for investment in R+D 
#CassandraSummit 2014 4
Present in 24 countries 
#CassandraSummit 2014 5
Their main brands 
#CassandraSummit 2014 6
Other brands 
#CassandraSummit 2014 7
Telefónica Global Solutions 
Global Security Services 
A global infrastructure to 
safeguard your business_ 
#CassandraSummit 2014 8
Managed Security 
#CassandraSummit 2014 9
CyberSecurity?? 
10
Why???? 
A picture is worth a thousand words - but a film clip, a million! 
#CassandraSummit 2014 11
Why???? 
A picture is worth a thousand words - but a film clip, a million! 
#CassandraSummit 2014 12
Why???? 
A picture is worth a thousand words - but a film clip, a million! 
#CassandraSummit 2014 13
Don’t worry… 
#CassandraSummit 2014 14
What is Cybersecurity? 
What does it mean for us? 
“Cybersecurity is the collection of tools, policies… capabilities to protect 
the cyber environment and organization and user’s assets. Cybersecurity 
strives to ensure unauthorized access to, manipulation of the integrity, 
confidentiality, or availability of an information, or unauthorized 
exfiltration of information.” 
No rules, just guidelines. 
#CassandraSummit 2014 15
An example of threats 
Cassandra OpsCenter 
World map 
Wordpress 
#CassandraSummit 2014 16
C* OpsCenter + Shodan 
#CassandraSummit 2014 17
C* OpsCenter + Shodan 
#CassandraSummit 2014 18
Another threats 
#CassandraSummit 2014 19
CyberSecurity in 
numbers 
20
Numbers 
Threats 
• DDoS (23%) 
• SQLi (19%) 
• Defacement (14%) 
• Account Hijacking (9%) 
• Unknown (18%) 
#CassandraSummit 2014 21
Looking for unknown threats 
#CassandraSummit 2014 22
What did Telefonica need? 
#CassandraSummit 2014 23
Joining efforts 
24
Joinnig efforts
Required skills 
#CassandraSummit 2014 26
in 
27 
Using
Use Case Architecture 
We have three phases: 
• Ingestion: based on Apache Kafka 
• Data fusion: based on Apache Storm. 
• Batch & Analytics: Based on Cassandra 
and Spark 
#CassandraSummit 2014 28
Data Adquisition 
• Data are in several sources: 
• DNS traffic 
• IP 
• Social media 
• Underground sources 
• Government sources 
• … 
Data sources 
Sources 
Sources 
Sources 
Sources 
Sources 
Sources 
KAFKA 
API 
• There are several sources consumers pulling the info and 
pushing it into a Kafka Cluster 
• Sources are heterogeneous and their speed is variable. 
Sources 
Sources 
#CassandraSummit 2014 29
Data fusion 
• We use Storm to process and 
normalize the information. 
• The system must fire alerts 
to the analysts. 
• This use case required a Big 
Data component capable of 
processing the data and 
extract its information in real-time. 
• Warnings and alerts are time-sensitive in order to deal efficiently with security attacks. 
#CassandraSummit 2014 30
Batch 
•The data are saved in 
Cassandra. 
•We use Cassandra directly for 
the easy queries. 
•And we used Spark to extract 
the information not accessible 
to cassandra directly. 
Data process 
INTEGRATION INTEGRATION INTEGRATION 
#CassandraSummit 2014 31
Why did we use C*? 
Because we need their features: 
• P2P architecture 
• Read/write performance 
• Fault tolerance 
• Easy to deploy 
• CQL 
#CassandraSummit 2014 32
Why did we use C*? 
•And we needed data modeler: 
•The data in Storm is normalize by source. 
• The primary key is the source key (f.e. IP) and a 
time stamp to split the cluster key. 
• All the data row have view tables with relationship 
between entities: IP, DNS, Domain… 
IP timestamp Timesplit … Domain … Table name: IP 
Primary Key ((IP, timestamp)timesplit) 
Domain timestamp timesplit IP1 … IPn Table name: IP_Domain 
Primary Key ((Domain, timestamp)timesplit) 
#CassandraSummit 2014 33
Why did we use C*? 
IP main table 
IP timestamp Timesplit … Domain … Table name: IP 
Primary Key ((IP, timestamp)timesplit) 
IP view for domain 
Domain timestamp timesplit IP1 … IPn Table name: IP_Domain 
Primary Key ((Domain, timestamp)timesplit) 
Domain main table 
Domain timestamp Timesplit … IP … Table name: domain 
Primary Key ((domain, timestamp)timesplit) 
IP view for domain 
IP timestamp timesplit domain1 … Domainn Table name: Domain_IP 
Primary Key ((IP, timestamp)timesplit) 
#CassandraSummit 2014 34
35 
What have we 
learned?
THE BEST OF BOTH 
WORLDS COMBINED 
“Two plus two is four? Sometimes… Sometimes it is five.” 
G. Orwell 
Combination wins
RISK 
Combination = add more and more products to the Stack
Complexity 
Platforms hybrid Hadoop + spark 
Hybrid = complexity
1 One stack to rule them all 
RDD-Based Matrices 
Interactive 
Batch 
processin 
g 
Stream 
processing 
Why Spark 
Batch 
Interactive [SQL] 
Streaming 
Machine Learning 
Learn just one system 
Develop within one framework 
Deploy/Manage just one system 
Databricks co-founder & CTO Matei Zaharia 
(source)
Be rational not only emotional
The only Pure Spark processing 
No Hadoop elements 
+10 
year old constraints
Lean simplicity 
Pure Spark Platform 
Former Hadoop or 
Hybrid Hadoop-Spark Platforms 
Lean = Easier deployment, management, and use 
of the system
Not to make a POC, but a real project for a Big Company is 
STRATIO 
ADMIN 
STRATIO 
DATAVIS 
STRATIO 
INGESTION 
STRATIO 
CROSSDATA 
(SPARK) 
CASSANDRA 
MONGO DB 
ELASTICSEARCH 
HDFS 
STRATIO 
STREAMING 
(SPARK STREAMING, 
SIDDHI) 
very demanding 
SPARK 
CERTIFIED
Multiple Combination 
API 
Elastic S 
https://github.com/Stratio/stratio-meta
Full text search + queries 
C* 
node 
C* 
node 
Lucene 
index 
C* 
node 
Lucene 
index 
C* 
node 
Lucene 
index 
C* 
node 
Lucene 
index 
Lucene 
index 
SELECT * FROM logs 
WHERE description 
MATCH ‘*Exception’;
Stratio Streaming 
•Start using Spark Streaming for 
doing some Complex Event 
Processing operations. 
https://github.com/Stratio/stratio-streaming
DATA JOURNEY THROUGH TIME 
PAS 
T 
PRESENT FUTURE 
Stored 
data 
Real Time 
Data 
Streaming 
ML 
Algorithms 
Ephemeral 
Tables 
Stored 
Tables 
SQL combination: Done 
SQL combination: In progress 
Quantum 
Tables
aTdhvaannkcse in 
#CassandraSummit 2014 48

Más contenido relacionado

La actualidad más candente

[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming OverviewStratio
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraStratio
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupModern Data Stack France
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
 
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei ZahariaTrends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei ZahariaSpark Summit
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark Summit
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Databricks
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Databricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 

La actualidad más candente (20)

[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Advanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in CassandraAdvanced search and Top-K queries in Cassandra
Advanced search and Top-K queries in Cassandra
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...
 
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei ZahariaTrends for Big Data and Apache Spark in 2017 by Matei Zaharia
Trends for Big Data and Apache Spark in 2017 by Matei Zaharia
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Databricks with R: Deep Dive
Databricks with R: Deep DiveDatabricks with R: Deep Dive
Databricks with R: Deep Dive
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 

Destacado

Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scalaStratio
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosStratio
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + KerberosStratio
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model TreesStratio
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Stratio
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1Stratio
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ ZooskCloudera, Inc.
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaStratio
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 

Destacado (19)

Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
ImpalaToGo use case
ImpalaToGo use caseImpalaToGo use case
ImpalaToGo use case
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 

Similar a Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with SparkDataStax Academy
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherDatabricks
 
Accelerate to Cloud
Accelerate to CloudAccelerate to Cloud
Accelerate to CloudRightScale
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
Detecting Mobile Malware with Apache Spark with David Pryce
 Detecting Mobile Malware with Apache Spark with David Pryce Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentDATAVERSITY
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Keith Kraus
 
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud SecurityGet Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud SecuritySymantec
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataSenturus
 
Thin Air or Solid Ground? Practical Cloud Security
Thin Air or Solid Ground? Practical Cloud SecurityThin Air or Solid Ground? Practical Cloud Security
Thin Air or Solid Ground? Practical Cloud SecurityDan Fitzgerald, CISSP, CIPM
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...Databricks
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoTJim Haughwout
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 

Similar a Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer (20)

Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark Together
 
Accelerate to Cloud
Accelerate to CloudAccelerate to Cloud
Accelerate to Cloud
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Detecting Mobile Malware with Apache Spark with David Pryce
 Detecting Mobile Malware with Apache Spark with David Pryce Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile Development
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Cloud transition - The Trivadis approach
Cloud transition - The Trivadis approachCloud transition - The Trivadis approach
Cloud transition - The Trivadis approach
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
 
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud SecurityGet Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
Get Your Head in the Cloud: A Practical Model for Enterprise Cloud Security
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Thin Air or Solid Ground? Practical Cloud Security
Thin Air or Solid Ground? Practical Cloud SecurityThin Air or Solid Ground? Practical Cloud Security
Thin Air or Solid Ground? Practical Cloud Security
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 

Más de Stratio

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupStratio
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science MeetupStratio
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupStratio
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Stratio
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0Stratio
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challengeStratio
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big DataStratio
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformStratio
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Stratio
 

Más de Stratio (13)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 

Último

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 

Último (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 

Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer

  • 1. Spark Use Case at Telefónica CyberSecurity (CBS) Antonio Alcocer antonio@stratio.com Oscar Mendez oscar@stratio.com @omendezsoto #CassandraSummit 2014 1
  • 2. Who are we? STRATIO • Stratio is a Big Data Company • Founded in 2013 • Commercially launched in 2014 • 50+ employees in Madrid • Office in San Francisco • Certified Spark distribution #CassandraSummit 2014 2
  • 3. 3
  • 4. General info o 1924- 2014: 317+ customer with 130.000+ employees o 2nd European operator by revenues o 4th global integrated operator by accesses o 9th Telco in the Global ranking by market capitalization o 2nd global operator for investment in R+D #CassandraSummit 2014 4
  • 5. Present in 24 countries #CassandraSummit 2014 5
  • 6. Their main brands #CassandraSummit 2014 6
  • 8. Telefónica Global Solutions Global Security Services A global infrastructure to safeguard your business_ #CassandraSummit 2014 8
  • 11. Why???? A picture is worth a thousand words - but a film clip, a million! #CassandraSummit 2014 11
  • 12. Why???? A picture is worth a thousand words - but a film clip, a million! #CassandraSummit 2014 12
  • 13. Why???? A picture is worth a thousand words - but a film clip, a million! #CassandraSummit 2014 13
  • 15. What is Cybersecurity? What does it mean for us? “Cybersecurity is the collection of tools, policies… capabilities to protect the cyber environment and organization and user’s assets. Cybersecurity strives to ensure unauthorized access to, manipulation of the integrity, confidentiality, or availability of an information, or unauthorized exfiltration of information.” No rules, just guidelines. #CassandraSummit 2014 15
  • 16. An example of threats Cassandra OpsCenter World map Wordpress #CassandraSummit 2014 16
  • 17. C* OpsCenter + Shodan #CassandraSummit 2014 17
  • 18. C* OpsCenter + Shodan #CassandraSummit 2014 18
  • 21. Numbers Threats • DDoS (23%) • SQLi (19%) • Defacement (14%) • Account Hijacking (9%) • Unknown (18%) #CassandraSummit 2014 21
  • 22. Looking for unknown threats #CassandraSummit 2014 22
  • 23. What did Telefonica need? #CassandraSummit 2014 23
  • 28. Use Case Architecture We have three phases: • Ingestion: based on Apache Kafka • Data fusion: based on Apache Storm. • Batch & Analytics: Based on Cassandra and Spark #CassandraSummit 2014 28
  • 29. Data Adquisition • Data are in several sources: • DNS traffic • IP • Social media • Underground sources • Government sources • … Data sources Sources Sources Sources Sources Sources Sources KAFKA API • There are several sources consumers pulling the info and pushing it into a Kafka Cluster • Sources are heterogeneous and their speed is variable. Sources Sources #CassandraSummit 2014 29
  • 30. Data fusion • We use Storm to process and normalize the information. • The system must fire alerts to the analysts. • This use case required a Big Data component capable of processing the data and extract its information in real-time. • Warnings and alerts are time-sensitive in order to deal efficiently with security attacks. #CassandraSummit 2014 30
  • 31. Batch •The data are saved in Cassandra. •We use Cassandra directly for the easy queries. •And we used Spark to extract the information not accessible to cassandra directly. Data process INTEGRATION INTEGRATION INTEGRATION #CassandraSummit 2014 31
  • 32. Why did we use C*? Because we need their features: • P2P architecture • Read/write performance • Fault tolerance • Easy to deploy • CQL #CassandraSummit 2014 32
  • 33. Why did we use C*? •And we needed data modeler: •The data in Storm is normalize by source. • The primary key is the source key (f.e. IP) and a time stamp to split the cluster key. • All the data row have view tables with relationship between entities: IP, DNS, Domain… IP timestamp Timesplit … Domain … Table name: IP Primary Key ((IP, timestamp)timesplit) Domain timestamp timesplit IP1 … IPn Table name: IP_Domain Primary Key ((Domain, timestamp)timesplit) #CassandraSummit 2014 33
  • 34. Why did we use C*? IP main table IP timestamp Timesplit … Domain … Table name: IP Primary Key ((IP, timestamp)timesplit) IP view for domain Domain timestamp timesplit IP1 … IPn Table name: IP_Domain Primary Key ((Domain, timestamp)timesplit) Domain main table Domain timestamp Timesplit … IP … Table name: domain Primary Key ((domain, timestamp)timesplit) IP view for domain IP timestamp timesplit domain1 … Domainn Table name: Domain_IP Primary Key ((IP, timestamp)timesplit) #CassandraSummit 2014 34
  • 35. 35 What have we learned?
  • 36. THE BEST OF BOTH WORLDS COMBINED “Two plus two is four? Sometimes… Sometimes it is five.” G. Orwell Combination wins
  • 37. RISK Combination = add more and more products to the Stack
  • 38. Complexity Platforms hybrid Hadoop + spark Hybrid = complexity
  • 39. 1 One stack to rule them all RDD-Based Matrices Interactive Batch processin g Stream processing Why Spark Batch Interactive [SQL] Streaming Machine Learning Learn just one system Develop within one framework Deploy/Manage just one system Databricks co-founder & CTO Matei Zaharia (source)
  • 40. Be rational not only emotional
  • 41. The only Pure Spark processing No Hadoop elements +10 year old constraints
  • 42. Lean simplicity Pure Spark Platform Former Hadoop or Hybrid Hadoop-Spark Platforms Lean = Easier deployment, management, and use of the system
  • 43. Not to make a POC, but a real project for a Big Company is STRATIO ADMIN STRATIO DATAVIS STRATIO INGESTION STRATIO CROSSDATA (SPARK) CASSANDRA MONGO DB ELASTICSEARCH HDFS STRATIO STREAMING (SPARK STREAMING, SIDDHI) very demanding SPARK CERTIFIED
  • 44. Multiple Combination API Elastic S https://github.com/Stratio/stratio-meta
  • 45. Full text search + queries C* node C* node Lucene index C* node Lucene index C* node Lucene index C* node Lucene index Lucene index SELECT * FROM logs WHERE description MATCH ‘*Exception’;
  • 46. Stratio Streaming •Start using Spark Streaming for doing some Complex Event Processing operations. https://github.com/Stratio/stratio-streaming
  • 47. DATA JOURNEY THROUGH TIME PAS T PRESENT FUTURE Stored data Real Time Data Streaming ML Algorithms Ephemeral Tables Stored Tables SQL combination: Done SQL combination: In progress Quantum Tables