This document discusses using MapReduce with Cassandra. It describes how writing to Cassandra from MapReduce has always been possible, while reading was enabled starting with Cassandra 0.6.x. Using MapReduce with Cassandra provides analytics capabilities and avoids single points of failure compared to MapReduce with HBase. The document covers setup and configuration considerations like locality, and provides examples of a separate cluster approach and hybrid cluster approach. It also outlines future work like improving output to Cassandra and adding Hive support.
4. MR + Cassandra - History
Writing to Cassandra - always been possible
5. MR + Cassandra - History
Writing to Cassandra - always been possible
Cassandra 0.6.x enables reading data
6. MR + Cassandra - History
Writing to Cassandra - always been possible
Cassandra 0.6.x enables reading data
Uses its own InputSplit, InputFormat, RecordReader
7. Why MR + Cassandra?
Cassandra is a great data store but what about
analytics? MapReduce!
Arguable win over MapReduce + HBase, no SPOF
14. Setup and Configuration
Job/Task Trackers
On already established cluster
Overlays Cassandra cluster
Hybrid
Locality
Gives data’s host information to job tracker
15. Setup and Configuration
Job/Task Trackers
On already established cluster
Overlays Cassandra cluster
Hybrid
Locality
Gives data’s host information to job tracker
Configure both topologies - Cassandra + Hadoop
25. Future Work
Simple output to Cassandra - Cassandra-1101
OutputFormat, OutputReducer, OutputWriter
Hive support - Cassandra-913
26. Future Work
Simple output to Cassandra - Cassandra-1101
OutputFormat, OutputReducer, OutputWriter
Hive support - Cassandra-913
Optimizations for start/end row - Cassandra-1125
27. Future Work
Simple output to Cassandra - Cassandra-1101
OutputFormat, OutputReducer, OutputWriter
Hive support - Cassandra-913
Optimizations for start/end row - Cassandra-1125
Other refinements based on feedback