SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Handling realtime and analytic
workloads in a single cluster
with Hadoop and Cassandra
Handling realtime and analytic
workloads in a single cluster
with Hadoop and Cassandra
Piotr Kołaczkowski
pkolaczk@datastax.com
@pkolaczk
Piotr Kołaczkowski
pkolaczk@datastax.com
@pkolaczk
Basic Cassandra + Hadoop Integration
C*
C*
C*
C*
C*
C*
C*
C*
Cassandra
Cluster
Hadoop Cluster
NameNode & JobTracker
DataNode DataNode
DataNode DataNode
DataNode DataNode
CFIF
CFOF
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Key: ByteBuffer
Value: SortedMap<ByteBuffer, IColumn>
(column name, value, timestamp)
row key
column name
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
jim
age: 36 car: camaro gender: M
Input Value:
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
carol
age: 37 car: subaru
Input Value:
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
johnny
age: 12 gender: M
Input Value:
ColumnFamilyInputFormat
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
Input Key:
suzy
age: 10 gender: F
Input Value:
CFIF – Wide Row Support
Input Key:
jim
age: 36
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
jim
car: camaro
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
jim
gender: M
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
carol
age: 37
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Wide Row Support
Input Key:
carol
car: subaru
Input Value:
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
CFIF – Cassandra Secondary Index Support
IndexExpression expr =
new IndexExpression(
ByteBufferUtil.bytes("car"),
IndexOperator.EQ,
ByteBufferUitl.bytes("subaru")
);
ConfigHelper.setInputRange(
job.getConfiguration(),
Arrays.asList(expr)
);
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru
johnny age: 12 gender: M
suzy age: 10 gender: F
ColumnFamilyOutputFormat
● Key: ByteBuffer (row key)
● Value: List<Mutation>
– Mutation: insert or delete a column
C*
C*
C*
C*
C*
C*
C*
C*
Cassandra
Cluster
ColumnFamilyRecordWriter
write
queue
client
thrift
CFOF – Creating Mutations
ByteBuffer rowkey = ByteBufferUtil.bytes(“carol”);
Column column = new Column();
column.name = ByteBufferUtil.bytes(“age”);
column.value = ByteBufferUtil.bytes(37);
List<Mutation> mutations;
Mutation mutation = new Mutation();
mutation.column_or_supercolumn = new ColumnOrSuperColumn();
mutation.column_or_supercolumn.column = column;
mutations.add(mutation);
context.write(rowkey, mutationList);
BulkOutputFormat
Hadoop Temporary Dir
SSTable 1 SSTable 2 SSTable N...
flush
write
BulkRecordWriter
Memory Buffer
DataStax Enterprise:
Cassandra and Hadoop in a Single Cluster
Basic Features
● Single, simplified component
● Workload separation
● No SPOF
● Peer to peer
● JobTracker failover
● No additional Cassandra config
System Administrator's View
Address DC Rack Workload Status State Load Owns Token
148873535527910577765226390751398592512
101.202.204.101 Analytics rack1 Analytics(JT) Up Normal 78,96 GB 12,50% 0
101.202.204.102 Analytics rack1 Analytics(TT) Up Normal 82,65 GB 12,50% 21267647932558653966460912964485513216
101.202.204.103 Analytics rack1 Analytics(TT) Up Normal 74,96 GB 12,50% 42535295865117307932921825928971026432
101.202.204.104 Analytics rack1 Analytics(TT) Up Normal 78,79 GB 12,50% 63802943797675961899382738893456539648
101.202.204.105 Cassandra rack1 Cassandra Up Normal 67,42 GB 12,50% 85070591730234615865843651857942052864
101.202.204.106 Cassandra rack1 Cassandra Up Normal 60,86 GB 12,50% 106338239662793269832304564822427566080
101.202.204.107 Cassandra rack1 Cassandra Up Normal 81,27 GB 12,50% 127605887595351923798765477786913079296
101.202.204.108 Cassandra rack1 Cassandra Up Normal 77,17 GB 12,50% 148873535527910577765226390751398592512
Easy monitoring of
your nodes,
regardless of their
workload type
Wait, but where are my files?
Hadoop M/R
HDFS
Hadoop M/R
CFS
Cassandra Server
Cassandra File System Properties
● Decentralized
● Replicated
● HDFS compatible
– compatible with Hadoop filesystem utilities
– allows for running M/R programs on DSE without
any change
● Compressed
CFS Architecture
CFS Compaction
● Keeps track of deleted rows (blocks)
● When all blocks in SSTable removed,
deletes the whole SSTable
Cassandra Storage
block 1
block 2
block 3
block 4
block 5
block 6
ts 1
ts 2
block 6 block 6block 7
block 8
ts 3
ts 4
block 6block 9
block 10
X
Hive Integration
● CassandraHiveMetaStore
– stores Hive database metadata in Cassandra
– no need to run a separate RDBMS
● CassandraStorageHandler
– allows for direct access to C* tables with CFIF and
CFOF
Hive Integration – Example
CREATE EXTERNAL TABLE MyHiveTable(row_key string, col1 string, col2 string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
TBLPROPERTIES ("cassandra.ks.name" = "MyCassandraKS");
SELECT count(*) FROM MyHiveTable;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201306041030_0001, Tracking URL = http://192.168.123.10:50030/jobdetails.jsp?jobid=job_201306041030_0001
Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=192.168.123.10:8012 -kill job_201306041030_0001
Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1
2013-06-04 15:11:54,573 Stage-1 map = 0%, reduce = 0%
2013-06-04 15:11:58,622 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec
2013-06-04 15:11:59,691 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec
...
2013-06-04 15:12:28,288 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
2013-06-04 15:12:29,304 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
2013-06-04 15:12:30,330 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
2013-06-04 15:12:31,339 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec
MapReduce Total cumulative CPU time: 31 seconds 910 msec
Ended Job = job_201306041030_0001
MapReduce Jobs Launched:
Job 0: Map: 9 Reduce: 1 Cumulative CPU: 31.91 sec HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 31 seconds 910 msec
OK
1000000
Time taken: 46.246 seconds
Custom Column Mapping
CREATE EXTERNAL TABLE Users(
userid string, name string, email string, phone string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH
SERDEPROPERTIES (
"cassandra.columns.mapping" = ":key,user_name,primary_email,home_phone");
Cassandra: row key user_name primary_email home_phone
Hive: userid name email phone

Más contenido relacionado

La actualidad más candente

SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrRomain Francois
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humansCraig Kerstiens
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
 
Embedded R Execution using SQL
Embedded R Execution using SQLEmbedded R Execution using SQL
Embedded R Execution using SQLBrendan Tierney
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalfSalo Shp
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterJeffrey Breen
 

La actualidad más candente (6)

SevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittrSevillaR meetup: dplyr and magrittr
SevillaR meetup: dplyr and magrittr
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Embedded R Execution using SQL
Embedded R Execution using SQLEmbedded R Execution using SQL
Embedded R Execution using SQL
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalf
 
R + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop clusterR + 15 minutes = Hadoop cluster
R + 15 minutes = Hadoop cluster
 

Destacado

Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationJoyabrata Das
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotModern Data Stack France
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Modern Data Stack France
 
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy HannaCassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy HannaModern Data Stack France
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsAhmad Jawwad
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopPatricia Gorla
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopHortonworks
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connectorDuyhai Doan
 
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaStratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaBig Data Spain
 

Destacado (20)

Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
IBM Stream au Hadoop User Group
IBM Stream au Hadoop User GroupIBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
 
Hadoop on Azure
Hadoop on AzureHadoop on Azure
Hadoop on Azure
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
 
Cascalog présenté par Bertrand Dechoux
Cascalog présenté par Bertrand DechouxCascalog présenté par Bertrand Dechoux
Cascalog présenté par Bertrand Dechoux
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)Talend Open Studio for Big Data (powered by Apache Hadoop)
Talend Open Studio for Big Data (powered by Apache Hadoop)
 
Cassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy HannaCassandra Hadoop Best Practices by Jeremy Hanna
Cassandra Hadoop Best Practices by Jeremy Hanna
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
 
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés PeñaStratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
 

Similar a Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski

Cassandra, web scale no sql data platform
Cassandra, web scale no sql data platformCassandra, web scale no sql data platform
Cassandra, web scale no sql data platformMarko Švaljek
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark Summit
 
Up and running with python
Up and running with pythonUp and running with python
Up and running with pythonBarry DeCicco
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
 
MongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud ManagerMongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud ManagerJay Gordon
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiDatabricks
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowQuick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowRafał Hryniewski
 
Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)GraceFalabi
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaScyllaDB
 
Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019Ibrar Ahmed
 
Redo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12cRedo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12cDebasish Nayak
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for EveryoneGiovanna Roda
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glassaaronmorton
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassDataStax
 
Megadata With Python and Hadoop
Megadata With Python and HadoopMegadata With Python and Hadoop
Megadata With Python and Hadoopryancox
 

Similar a Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski (20)

Cassandra, web scale no sql data platform
Cassandra, web scale no sql data platformCassandra, web scale no sql data platform
Cassandra, web scale no sql data platform
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 
Cram
CramCram
Cram
 
Up and running with python
Up and running with pythonUp and running with python
Up and running with python
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
 
MongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud ManagerMongoDB and DigitalOcean Automation with Cloud Manager
MongoDB and DigitalOcean Automation with Cloud Manager
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to knowQuick trip around the Cosmos - Things every astronaut supposed to know
Quick trip around the Cosmos - Things every astronaut supposed to know
 
Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)Loan-defaulters-predictions(Python codes)
Loan-defaulters-predictions(Python codes)
 
Streaming Data from Scylla to Kafka
Streaming Data from Scylla to KafkaStreaming Data from Scylla to Kafka
Streaming Data from Scylla to Kafka
 
Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019Postgres Conference (PgCon) New York 2019
Postgres Conference (PgCon) New York 2019
 
Redo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12cRedo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12c
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
 
Big Data Analytics Lab File
Big Data Analytics Lab FileBig Data Analytics Lab File
Big Data Analytics Lab File
 
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break GlassCassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass
 
Cassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break GlassCassandra Community Webinar | In Case of Emergency Break Glass
Cassandra Community Webinar | In Case of Emergency Break Glass
 
Megadata With Python and Hadoop
Megadata With Python and HadoopMegadata With Python and Hadoop
Megadata With Python and Hadoop
 
Explain this!
Explain this!Explain this!
Explain this!
 

Más de Modern Data Stack France

Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupModern Data Stack France
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Modern Data Stack France
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Modern Data Stack France
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlusModern Data Stack France
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)Modern Data Stack France
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Modern Data Stack France
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Modern Data Stack France
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015Modern Data Stack France
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandationModern Data Stack France
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielModern Data Stack France
 

Más de Modern Data Stack France (20)

Stash - Data FinOPS
Stash - Data FinOPSStash - Data FinOPS
Stash - Data FinOPS
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Talend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark MeetupTalend spark meetup 03042017 - Paris Spark Meetup
Talend spark meetup 03042017 - Paris Spark Meetup
 
Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017Paris Spark Meetup - Trifacta - 03_04_2017
Paris Spark Meetup - Trifacta - 03_04_2017
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Hug janvier 2016 -EDF
Hug   janvier 2016 -EDFHug   janvier 2016 -EDF
Hug janvier 2016 -EDF
 
HUG France - 20160114 industrialisation_process_big_data CanalPlus
HUG France -  20160114 industrialisation_process_big_data CanalPlusHUG France -  20160114 industrialisation_process_big_data CanalPlus
HUG France - 20160114 industrialisation_process_big_data CanalPlus
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
HUG France : HBase in Financial Industry par Pierre Bittner (Scaled Risk CTO)
 
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
Apache Flink par Bilal Baltagi Paris Spark Meetup Dec 2015
 
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
Datalab 101 (Hadoop, Spark, ElasticSearch) par Jonathan Winandy - Paris Spark...
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
Spark dataframe
Spark dataframeSpark dataframe
Spark dataframe
 
June Spark meetup : search as recommandation
June Spark meetup : search as recommandationJune Spark meetup : search as recommandation
June Spark meetup : search as recommandation
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark meetup at viadeo
Spark meetup at viadeoSpark meetup at viadeo
Spark meetup at viadeo
 
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamielParis Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
Paris Spark meetup : Extension de Spark (Tachyon / Spark JobServer) par jlamiel
 

Último

Technical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeTechnical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeOptics-Trade
 
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyReal Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyApk Toly
 
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics TradeInstruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics TradeOptics-Trade
 
IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.SJU Quizzers
 
Turkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdf
Turkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdfTurkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdf
Turkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdfEticketing.co
 
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxItaly Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxWorld Wide Tickets And Hospitality
 
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxFrance's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxEuro Cup 2024 Tickets
 
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Servicesnajka9823
 
Austria VS France Injury Woes a Look at Euro 2024 Qualifiers.docx
Austria VS France Injury Woes a Look at Euro 2024 Qualifiers.docxAustria VS France Injury Woes a Look at Euro 2024 Qualifiers.docx
Austria VS France Injury Woes a Look at Euro 2024 Qualifiers.docxWorld Wide Tickets And Hospitality
 
Technical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeTechnical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeOptics-Trade
 
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...Eticketing.co
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeOptics-Trade
 
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/78377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7dollysharma2066
 
Expert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLExpert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLAll American Billiards
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfArturo Pacheco Alvarez
 

Último (16)

Technical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics TradeTechnical Data | ThermTec Wild 335 | Optics Trade
Technical Data | ThermTec Wild 335 | Optics Trade
 
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited MoneyReal Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
Real Moto 2 MOD APK v1.1.721 All Bikes, Unlimited Money
 
Denmark Vs Serbia Haaland Euro Cup CPR Drive Incident.docx
Denmark Vs Serbia Haaland Euro Cup CPR Drive Incident.docxDenmark Vs Serbia Haaland Euro Cup CPR Drive Incident.docx
Denmark Vs Serbia Haaland Euro Cup CPR Drive Incident.docx
 
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics TradeInstruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
Instruction Manual | ThermTec Hunt Thermal Clip-On Series | Optics Trade
 
IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.IPL Quiz ( weekly quiz) by SJU quizzers.
IPL Quiz ( weekly quiz) by SJU quizzers.
 
Turkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdf
Turkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdfTurkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdf
Turkiye Vs Georgia Turkey's UEFA Euro 2024 Journey with High Hopes.pdf
 
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docxItaly Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
Italy Vs Albania Euro Cup 2024 Italy's Strategy for Success.docx
 
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docxFrance's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
France's UEFA Euro 2024 Ambitions Amid Coman's Injury.docx
 
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesMysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Mysore Call Girls 7001305949 WhatsApp Number 24x7 Best Services
 
Austria VS France Injury Woes a Look at Euro 2024 Qualifiers.docx
Austria VS France Injury Woes a Look at Euro 2024 Qualifiers.docxAustria VS France Injury Woes a Look at Euro 2024 Qualifiers.docx
Austria VS France Injury Woes a Look at Euro 2024 Qualifiers.docx
 
Technical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics TradeTechnical Data | ThermTec Wild 650L | Optics Trade
Technical Data | ThermTec Wild 650L | Optics Trade
 
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
Austria vs France David Alaba Switches Position to Defender in Austria's Euro...
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
 
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/78377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
8377087607 ☎, Cash On Delivery Call Girls Service In Hauz Khas Delhi Enjoy 24/7
 
Expert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FLExpert Pool Table Refelting in Lee & Collier County, FL
Expert Pool Table Refelting in Lee & Collier County, FL
 
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdfJORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
JORNADA 3 LIGA MURO 2024GHGHGHGHGHGH.pdf
 

Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski

  • 1. Handling realtime and analytic workloads in a single cluster with Hadoop and Cassandra Handling realtime and analytic workloads in a single cluster with Hadoop and Cassandra Piotr Kołaczkowski pkolaczk@datastax.com @pkolaczk Piotr Kołaczkowski pkolaczk@datastax.com @pkolaczk
  • 2. Basic Cassandra + Hadoop Integration C* C* C* C* C* C* C* C* Cassandra Cluster Hadoop Cluster NameNode & JobTracker DataNode DataNode DataNode DataNode DataNode DataNode CFIF CFOF
  • 3. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Key: ByteBuffer Value: SortedMap<ByteBuffer, IColumn> (column name, value, timestamp) row key column name
  • 4. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: jim age: 36 car: camaro gender: M Input Value:
  • 5. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: carol age: 37 car: subaru Input Value:
  • 6. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: johnny age: 12 gender: M Input Value:
  • 7. ColumnFamilyInputFormat jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F Input Key: suzy age: 10 gender: F Input Value:
  • 8. CFIF – Wide Row Support Input Key: jim age: 36 Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 9. CFIF – Wide Row Support Input Key: jim car: camaro Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 10. CFIF – Wide Row Support Input Key: jim gender: M Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 11. CFIF – Wide Row Support Input Key: carol age: 37 Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 12. CFIF – Wide Row Support Input Key: carol car: subaru Input Value: jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 13. CFIF – Cassandra Secondary Index Support IndexExpression expr = new IndexExpression( ByteBufferUtil.bytes("car"), IndexOperator.EQ, ByteBufferUitl.bytes("subaru") ); ConfigHelper.setInputRange( job.getConfiguration(), Arrays.asList(expr) ); jim age: 36 car: camaro gender: M carol age: 37 car: subaru johnny age: 12 gender: M suzy age: 10 gender: F
  • 14. ColumnFamilyOutputFormat ● Key: ByteBuffer (row key) ● Value: List<Mutation> – Mutation: insert or delete a column C* C* C* C* C* C* C* C* Cassandra Cluster ColumnFamilyRecordWriter write queue client thrift
  • 15. CFOF – Creating Mutations ByteBuffer rowkey = ByteBufferUtil.bytes(“carol”); Column column = new Column(); column.name = ByteBufferUtil.bytes(“age”); column.value = ByteBufferUtil.bytes(37); List<Mutation> mutations; Mutation mutation = new Mutation(); mutation.column_or_supercolumn = new ColumnOrSuperColumn(); mutation.column_or_supercolumn.column = column; mutations.add(mutation); context.write(rowkey, mutationList);
  • 16. BulkOutputFormat Hadoop Temporary Dir SSTable 1 SSTable 2 SSTable N... flush write BulkRecordWriter Memory Buffer
  • 17. DataStax Enterprise: Cassandra and Hadoop in a Single Cluster
  • 18. Basic Features ● Single, simplified component ● Workload separation ● No SPOF ● Peer to peer ● JobTracker failover ● No additional Cassandra config
  • 19. System Administrator's View Address DC Rack Workload Status State Load Owns Token 148873535527910577765226390751398592512 101.202.204.101 Analytics rack1 Analytics(JT) Up Normal 78,96 GB 12,50% 0 101.202.204.102 Analytics rack1 Analytics(TT) Up Normal 82,65 GB 12,50% 21267647932558653966460912964485513216 101.202.204.103 Analytics rack1 Analytics(TT) Up Normal 74,96 GB 12,50% 42535295865117307932921825928971026432 101.202.204.104 Analytics rack1 Analytics(TT) Up Normal 78,79 GB 12,50% 63802943797675961899382738893456539648 101.202.204.105 Cassandra rack1 Cassandra Up Normal 67,42 GB 12,50% 85070591730234615865843651857942052864 101.202.204.106 Cassandra rack1 Cassandra Up Normal 60,86 GB 12,50% 106338239662793269832304564822427566080 101.202.204.107 Cassandra rack1 Cassandra Up Normal 81,27 GB 12,50% 127605887595351923798765477786913079296 101.202.204.108 Cassandra rack1 Cassandra Up Normal 77,17 GB 12,50% 148873535527910577765226390751398592512 Easy monitoring of your nodes, regardless of their workload type
  • 20. Wait, but where are my files? Hadoop M/R HDFS Hadoop M/R CFS Cassandra Server
  • 21. Cassandra File System Properties ● Decentralized ● Replicated ● HDFS compatible – compatible with Hadoop filesystem utilities – allows for running M/R programs on DSE without any change ● Compressed
  • 23. CFS Compaction ● Keeps track of deleted rows (blocks) ● When all blocks in SSTable removed, deletes the whole SSTable Cassandra Storage block 1 block 2 block 3 block 4 block 5 block 6 ts 1 ts 2 block 6 block 6block 7 block 8 ts 3 ts 4 block 6block 9 block 10 X
  • 24. Hive Integration ● CassandraHiveMetaStore – stores Hive database metadata in Cassandra – no need to run a separate RDBMS ● CassandraStorageHandler – allows for direct access to C* tables with CFIF and CFOF
  • 25. Hive Integration – Example CREATE EXTERNAL TABLE MyHiveTable(row_key string, col1 string, col2 string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' TBLPROPERTIES ("cassandra.ks.name" = "MyCassandraKS"); SELECT count(*) FROM MyHiveTable; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201306041030_0001, Tracking URL = http://192.168.123.10:50030/jobdetails.jsp?jobid=job_201306041030_0001 Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=192.168.123.10:8012 -kill job_201306041030_0001 Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 1 2013-06-04 15:11:54,573 Stage-1 map = 0%, reduce = 0% 2013-06-04 15:11:58,622 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec 2013-06-04 15:11:59,691 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec ... 2013-06-04 15:12:28,288 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec 2013-06-04 15:12:29,304 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec 2013-06-04 15:12:30,330 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec 2013-06-04 15:12:31,339 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec MapReduce Total cumulative CPU time: 31 seconds 910 msec Ended Job = job_201306041030_0001 MapReduce Jobs Launched: Job 0: Map: 9 Reduce: 1 Cumulative CPU: 31.91 sec HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 31 seconds 910 msec OK 1000000 Time taken: 46.246 seconds
  • 26. Custom Column Mapping CREATE EXTERNAL TABLE Users( userid string, name string, email string, phone string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,user_name,primary_email,home_phone"); Cassandra: row key user_name primary_email home_phone Hive: userid name email phone