SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Stratio Meta 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20141"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20142"
Who are we? 
STRATIO 
• Stra3o-is-a-Big-Data-Company 
• Founded-in-2013 
• Commercially-launched-in-2014 
• 50+-employees-in-Madrid 
• Office-in-San-Francisco 
• Cer3fied-Spark-distribu3on 
#CassandraSummit-2014 
3"
We love… 
Cassandra 
• P2P-architecture 
• Read/write-performance 
• Fault-tolerance 
• Easy-to-deploy 
• CQL 
#CassandraSummit-2014 
4"
• Introduction 
• Crossdata architecture 
• Metadata management 
• Streaming sources 
• Full text search 
• Spark and Crossdata 
• ODBC 
• The future 
Agenda 
5"
Introduction 
o Big-Data-analysis-is-commonly-associated-with-batch-processing 
• Users-aiming-to-combine-batch-and-stream-processing-have-to- 
rely-on-tailorRmade-architectures 
o Users-buy-Big-Data-plaSorms,-but 
• How-do-I-start? 
• What-is-my-entry-point-to-the-plaSorm? 
#CassandraSummit-2014 
6"
What our clients demand? 
o Easy-deployment 
o Easy-administra3on 
o Read/write-performance 
o EasyRtoRlearn-query-language-o 
Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
7"
What our clients demand? 
! Easy%deployment% 
! Easy%administra0on% 
! Read/write%performance% 
! Easy6to6learn%query%language% 
o Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
8"
What our clients demand? 
! Easy"deployment" 
! Easy"administra8on" 
! Read/write"performance" 
! Easy>to>learn"query"language" 
! Integra3on-with-BI-Tools 
! Join-opera3ons 
! Support-for-streaming-sources 
! Integra3on-with-other-data-stores 
! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
9"
Crossdata 
o A-new-technology-that: 
• Is-not-limited-by-the-underlying-datastore-capabili3es 
• Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons 
• Supports-batch-and-streaming-queries 
• Supports-mul3ple-clusters-and-technologies 
#CassandraSummit-2014 
10"
Our architecture 
#CassandraSummit-2014 
11"
Connecting to the outside world 
o Crossdata-defines-an-IConnector-extension-interface 
o User-can-easily-add-new-connectors-to-support 
• Different-datastores 
• Different-processing-engines 
• Different-versions 
o Where-each-connector-defines-its-capabili3es 
#CassandraSummit-2014 
12" 
Our planner will choose the best connector for each query
Query execution 
#CassandraSummit-2014 
13" 
Parsing" Valida8on" Planning" Execu8on" 
C*" 
Connector1" 
Connector2" 
Connector3" 
Our planner will choose the best connector for each query
Multi-cluster support 
o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- 
across-a-set-of-datastores.- 
• Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance 
" E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- 
readRop3mized-cluster,-etc.- 
• A-table-is-saved-in-a-unique-datastore 
#CassandraSummit-2014 
14"
Logical and physical mapping 
SELECT&*&FROM&app.users;& 
Users"table" Test"table" old_users"table" 
#CassandraSummit-2014 
15" 
App"catalog" 
C*"produc8on" C*"development" Other"datastores"
Metadata 
Management 
16"
Metadata in the era of Schemaless NoSQL datastores 
o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- 
• Flexible-schemas-vs-Schemaless 
• Crossdata-provides-a-Metadata-manager-that-stores-schemas- 
for-any-datasource 
" Remember-ODBC-and-those-BI-tools 
" 
1010010101010 
1010110101010 
1111010001111 
?" 001000" 
#CassandraSummit-2014 
17"
Metadata management 
#CassandraSummit-2014 
18" 
Connector" 
C*"produc8on" 
Metadata"Store" 
Infinispan" 
Metadata"Manager" 
2% 
Updated"metadata" 
informa8on"is" 
maintained"among" 
Crossdata"servers" 
using"Infinispan" 
If"the"connector"does" 
not"support"metadata" 
opera8ons"those"are" 
skipped" 1% 2%
Streaming sources 
19"
Managing streaming sources 
o Nowadays-use-cases-expect-some-type-of-streaming-datasource 
• Streaming-data-has-an-ephemeral-nature 
• In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- 
#CassandraSummit-2014 
to-work-with-streaming-sources-as-classical- 
RDBMS-tables 
20" 
streaming" 
source" 
{schema:{col1:…},…}" 
col1:text" col2:int" col3:int" col4:text" 
Streaming_query0" 
…" 
Streaming_queryn"
Streaming queries 
o Streaming-queries-are-infinite-by-defini3on 
• A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- 
ingested-by-the-system-in-that-period 
• The-user-launches-queries-specifying-a-processing-3me-window 
" Crossdata-provides-methods-to-list-and-stop-running-streaming- 
#CassandraSummit-2014 
queries 
21"
Streaming queries: windows syntax 
#CassandraSummit-2014 
22" 
SELECT fieldGroup,avg(Field2) 
FROM eph_table 
WITH WINDOW 5 minutes 
WHERE field1=100 AND field2>100 
GROUP BY fieldGroup;
Joining batch and streaming 
SELECT * FROM demo.temporal 
WITH WINDOW 10 secs 
INNER JOIN demo.users 
#CassandraSummit-2014 
ON users.name = temporal.name; 
SELECT * FROM 
demo.temporal 
WITH WINDOW 10 secs 
" 
SELECT * 
FROM demo.users 
" 
INNER JOIN ON 
users.name = 
temporal.name 
" 
23"
Full text search 
24"
Full text search with 
o Clients-request-the-ability-to-perform-full-text-searches 
o We-have-developed-an-integra3on-between-Lucene-and- 
Cassandra 
o C*-users-can-now-enjoy-all-Lucene-features: 
• Full-text-searches,-range-queries,-fuzzy-queries…. 
#CassandraSummit-2014 
25" 
https://github.com/Stratio/stratio-cassandra
Stratio Lucene 2i 
#CassandraSummit-2014 
26" 
C*" 
node" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
Lucene" 
index"
Full text search queries 
o With-Crossdata,-we-simplify: 
• The-crea3on-syntax- 
• The-query-syntax-using-the-match-operator 
#CassandraSummit-2014 
27" 
CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& 
SELECT&*&FROM&app.users&& 
where&email&MATCH&‘*@stratio.com’;&
& Stratio Crossdata 
28"
Why Spark? 
o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons 
o Spark-brings-several-benefits-over-Hadoop-o 
InRMemory-processing 
o RDD-abstrac3on 
o Simpler-API-o 
Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) 
#CassandraSummit-2014 
29"
What about Spark SQL? 
o Different-approach-to-query-execu3on 
• We-only-use-Spark-when-it-speedups-queries 
" Na3ve-drivers-are-faster-for-simple-queries 
" Spark-SQL-has-limited-RDD-sources 
• Avoid-some-Spark-limita3ons 
• Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 
#CassandraSummit-2014 
30"
Query approach 
SparkSQL"approach" Crossdata"approach" 
#CassandraSummit-2014 
SparkSQL" 
Spark" 
Cassandra" 
Spark" Na8ve"driver" 
Cassandra" 
31" 
Stra8o"Crossdata"
Our Cassandra-Spark integration 
o Project-started-in-June-2013 
" With-the-objec3ve-of-providing-a-method-to-interact-with- 
Cassandra-from-Spark 
" Ini3al-approach-based-on-the-HadoopInputFormat-interface 
" Current-version-uses-the-na3ve-Datastax-Java-driver 
#CassandraSummit-2014 
32" 
https://github.com/Stratio/stratio-deep
Our Cassandra-Spark integration 
o Benchmark-in-process-comparing-our-solu3on-with-the- 
Datastax-Spark-driver 
• Results-highly-influenced-by-the-split-size 
• Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- 
Datastax-default-values 
• Group-by-–-up-to-40%-faster 
• Join-–-up-to-17%-faster 
• Stay-tuned-for-the-benchmark-publica3on! 
#CassandraSummit-2014 
33"
Spark vs Lucene 2i 
#CassandraSummit-2014 
34" 
Time" 
Spark" 
Lucen"2i" 
Records"returned"
ODBC 
35"
Stratio Crossdata ODBC 
o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) 
o We-have-implemented-it-using-Simba-SDK 
o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- 
world 
o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel 
#CassandraSummit-2014 
36" 
One ODBC for all datastores!
The future 
37"
The future 
o Security 
o Query-op3mizer-and-smart-query-planner 
o Leverage-system-sta3s3cs 
o Support-for-UDFs 
o Become-an-Apache-project 
#CassandraSummit-2014 
38" 
https://github.com/Stratio/stratio-meta
We are looking for an Apache Champion 
#CassandraSummit-2014 
39" 
Can"you" 
help"us?"
A wish list for Cassandra 
o Ability-to-stop-running-queries 
o Interac3ve-users-are-unpredictable 
o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) 
o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator 
• E.g.,-aggrega3ons-like-count(*) 
#CassandraSummit-2014 
40"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-201441"

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 

Destacado

Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
Stratio
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
Stratio
 
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Asociación XBRL España
 

Destacado (20)

Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
UNION BANCARIA EN LA UNION EUROPEA
UNION BANCARIA EN LA UNION EUROPEAUNION BANCARIA EN LA UNION EUROPEA
UNION BANCARIA EN LA UNION EUROPEA
 
El modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
El modelo europeo de reporting y el lenguaje XBRL - Ignacio BoixoEl modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
El modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
 
La Unión Bancaria Europea
La Unión Bancaria EuropeaLa Unión Bancaria Europea
La Unión Bancaria Europea
 
Presentacion
PresentacionPresentacion
Presentacion
 
Recuperación y Unión Bancaria Europea. Emilio Ontiveros
Recuperación y Unión Bancaria Europea. Emilio OntiverosRecuperación y Unión Bancaria Europea. Emilio Ontiveros
Recuperación y Unión Bancaria Europea. Emilio Ontiveros
 
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
 

Similar a Stratio CrossData: an efficient distributed datahub with batch and streaming query capabilities

Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Duyhai Doan
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 

Similar a Stratio CrossData: an efficient distributed datahub with batch and streaming query capabilities (20)

Presentation
PresentationPresentation
Presentation
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 

Más de Stratio

Más de Stratio (13)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 

Último

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Último (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Stratio CrossData: an efficient distributed datahub with batch and streaming query capabilities

  • 1. Stratio Meta An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-20141"
  • 2. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-20142"
  • 3. Who are we? STRATIO • Stra3o-is-a-Big-Data-Company • Founded-in-2013 • Commercially-launched-in-2014 • 50+-employees-in-Madrid • Office-in-San-Francisco • Cer3fied-Spark-distribu3on #CassandraSummit-2014 3"
  • 4. We love… Cassandra • P2P-architecture • Read/write-performance • Fault-tolerance • Easy-to-deploy • CQL #CassandraSummit-2014 4"
  • 5. • Introduction • Crossdata architecture • Metadata management • Streaming sources • Full text search • Spark and Crossdata • ODBC • The future Agenda 5"
  • 6. Introduction o Big-Data-analysis-is-commonly-associated-with-batch-processing • Users-aiming-to-combine-batch-and-stream-processing-have-to- rely-on-tailorRmade-architectures o Users-buy-Big-Data-plaSorms,-but • How-do-I-start? • What-is-my-entry-point-to-the-plaSorm? #CassandraSummit-2014 6"
  • 7. What our clients demand? o Easy-deployment o Easy-administra3on o Read/write-performance o EasyRtoRlearn-query-language-o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 7"
  • 8. What our clients demand? ! Easy%deployment% ! Easy%administra0on% ! Read/write%performance% ! Easy6to6learn%query%language% o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 8"
  • 9. What our clients demand? ! Easy"deployment" ! Easy"administra8on" ! Read/write"performance" ! Easy>to>learn"query"language" ! Integra3on-with-BI-Tools ! Join-opera3ons ! Support-for-streaming-sources ! Integra3on-with-other-data-stores ! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 9"
  • 10. Crossdata o A-new-technology-that: • Is-not-limited-by-the-underlying-datastore-capabili3es • Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons • Supports-batch-and-streaming-queries • Supports-mul3ple-clusters-and-technologies #CassandraSummit-2014 10"
  • 12. Connecting to the outside world o Crossdata-defines-an-IConnector-extension-interface o User-can-easily-add-new-connectors-to-support • Different-datastores • Different-processing-engines • Different-versions o Where-each-connector-defines-its-capabili3es #CassandraSummit-2014 12" Our planner will choose the best connector for each query
  • 13. Query execution #CassandraSummit-2014 13" Parsing" Valida8on" Planning" Execu8on" C*" Connector1" Connector2" Connector3" Our planner will choose the best connector for each query
  • 14. Multi-cluster support o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- across-a-set-of-datastores.- • Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance " E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- readRop3mized-cluster,-etc.- • A-table-is-saved-in-a-unique-datastore #CassandraSummit-2014 14"
  • 15. Logical and physical mapping SELECT&*&FROM&app.users;& Users"table" Test"table" old_users"table" #CassandraSummit-2014 15" App"catalog" C*"produc8on" C*"development" Other"datastores"
  • 17. Metadata in the era of Schemaless NoSQL datastores o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- • Flexible-schemas-vs-Schemaless • Crossdata-provides-a-Metadata-manager-that-stores-schemas- for-any-datasource " Remember-ODBC-and-those-BI-tools " 1010010101010 1010110101010 1111010001111 ?" 001000" #CassandraSummit-2014 17"
  • 18. Metadata management #CassandraSummit-2014 18" Connector" C*"produc8on" Metadata"Store" Infinispan" Metadata"Manager" 2% Updated"metadata" informa8on"is" maintained"among" Crossdata"servers" using"Infinispan" If"the"connector"does" not"support"metadata" opera8ons"those"are" skipped" 1% 2%
  • 20. Managing streaming sources o Nowadays-use-cases-expect-some-type-of-streaming-datasource • Streaming-data-has-an-ephemeral-nature • In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- #CassandraSummit-2014 to-work-with-streaming-sources-as-classical- RDBMS-tables 20" streaming" source" {schema:{col1:…},…}" col1:text" col2:int" col3:int" col4:text" Streaming_query0" …" Streaming_queryn"
  • 21. Streaming queries o Streaming-queries-are-infinite-by-defini3on • A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- ingested-by-the-system-in-that-period • The-user-launches-queries-specifying-a-processing-3me-window " Crossdata-provides-methods-to-list-and-stop-running-streaming- #CassandraSummit-2014 queries 21"
  • 22. Streaming queries: windows syntax #CassandraSummit-2014 22" SELECT fieldGroup,avg(Field2) FROM eph_table WITH WINDOW 5 minutes WHERE field1=100 AND field2>100 GROUP BY fieldGroup;
  • 23. Joining batch and streaming SELECT * FROM demo.temporal WITH WINDOW 10 secs INNER JOIN demo.users #CassandraSummit-2014 ON users.name = temporal.name; SELECT * FROM demo.temporal WITH WINDOW 10 secs " SELECT * FROM demo.users " INNER JOIN ON users.name = temporal.name " 23"
  • 25. Full text search with o Clients-request-the-ability-to-perform-full-text-searches o We-have-developed-an-integra3on-between-Lucene-and- Cassandra o C*-users-can-now-enjoy-all-Lucene-features: • Full-text-searches,-range-queries,-fuzzy-queries…. #CassandraSummit-2014 25" https://github.com/Stratio/stratio-cassandra
  • 26. Stratio Lucene 2i #CassandraSummit-2014 26" C*" node" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" Lucene" index"
  • 27. Full text search queries o With-Crossdata,-we-simplify: • The-crea3on-syntax- • The-query-syntax-using-the-match-operator #CassandraSummit-2014 27" CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& SELECT&*&FROM&app.users&& where&email&MATCH&‘*@stratio.com’;&
  • 29. Why Spark? o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons o Spark-brings-several-benefits-over-Hadoop-o InRMemory-processing o RDD-abstrac3on o Simpler-API-o Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) #CassandraSummit-2014 29"
  • 30. What about Spark SQL? o Different-approach-to-query-execu3on • We-only-use-Spark-when-it-speedups-queries " Na3ve-drivers-are-faster-for-simple-queries " Spark-SQL-has-limited-RDD-sources • Avoid-some-Spark-limita3ons • Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 #CassandraSummit-2014 30"
  • 31. Query approach SparkSQL"approach" Crossdata"approach" #CassandraSummit-2014 SparkSQL" Spark" Cassandra" Spark" Na8ve"driver" Cassandra" 31" Stra8o"Crossdata"
  • 32. Our Cassandra-Spark integration o Project-started-in-June-2013 " With-the-objec3ve-of-providing-a-method-to-interact-with- Cassandra-from-Spark " Ini3al-approach-based-on-the-HadoopInputFormat-interface " Current-version-uses-the-na3ve-Datastax-Java-driver #CassandraSummit-2014 32" https://github.com/Stratio/stratio-deep
  • 33. Our Cassandra-Spark integration o Benchmark-in-process-comparing-our-solu3on-with-the- Datastax-Spark-driver • Results-highly-influenced-by-the-split-size • Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- Datastax-default-values • Group-by-–-up-to-40%-faster • Join-–-up-to-17%-faster • Stay-tuned-for-the-benchmark-publica3on! #CassandraSummit-2014 33"
  • 34. Spark vs Lucene 2i #CassandraSummit-2014 34" Time" Spark" Lucen"2i" Records"returned"
  • 36. Stratio Crossdata ODBC o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) o We-have-implemented-it-using-Simba-SDK o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- world o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel #CassandraSummit-2014 36" One ODBC for all datastores!
  • 38. The future o Security o Query-op3mizer-and-smart-query-planner o Leverage-system-sta3s3cs o Support-for-UDFs o Become-an-Apache-project #CassandraSummit-2014 38" https://github.com/Stratio/stratio-meta
  • 39. We are looking for an Apache Champion #CassandraSummit-2014 39" Can"you" help"us?"
  • 40. A wish list for Cassandra o Ability-to-stop-running-queries o Interac3ve-users-are-unpredictable o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator • E.g.,-aggrega3ons-like-count(*) #CassandraSummit-2014 40"
  • 41. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-201441"