4. Some Background
• Hadoop support since early 2010
• MapReduce/Pig works with any Hadoop 1.x
distribution.
Thursday, June 6, 13
5. Some Background
• Hadoop support since early 2010
• MapReduce/Pig works with any Hadoop 1.x
distribution.
• Hive is a neatly integrated piece of DSE
Thursday, June 6, 13
6. Some Background
• Hadoop support since early 2010
• MapReduce/Pig works with any Hadoop 1.x
distribution.
• Hive is a neatly integrated piece of DSE
• Data locality just like with HDFS
Thursday, June 6, 13
7. Some Background
• Hadoop support since early 2010
• MapReduce/Pig works with any Hadoop 1.x
distribution.
• Hive is a neatly integrated piece of DSE
• Data locality just like with HDFS
• Cassandra can handle ~200 CFs
Thursday, June 6, 13
11. Setup
• Analytics specific datacenter
• Configure replication (KS/DC specific)
• Isolated reads at CL.LOCAL_QUORUM
Thursday, June 6, 13
12. Setup
• Analytics specific datacenter
• Configure replication (KS/DC specific)
• Isolated reads at CL.LOCAL_QUORUM
• Writes will be replicated
Thursday, June 6, 13
13. Setup
• Analytics specific datacenter
• Configure replication (KS/DC specific)
• Isolated reads at CL.LOCAL_QUORUM
• Writes will be replicated
• Same best practices as with Hadoop alone
Thursday, June 6, 13
15. Vanilla Hadoop
• Co-locate task trackers and data nodes
with Cassandra nodes (data locality)
Thursday, June 6, 13
16. Vanilla Hadoop
• Co-locate task trackers and data nodes
with Cassandra nodes (data locality)
• Workload isolation with separate
Cassandra datacenter configured
Thursday, June 6, 13
20. Planning
• MapReduce over full column family
• Model data accordingly
• Add more column families
Thursday, June 6, 13
21. Planning
• MapReduce over full column family
• Model data accordingly
• Add more column families
• Can use secondary index, but use caution
Thursday, June 6, 13
24. Execution
• Project and select early in your workflow
• Store common intermediate datasets (in
CFS/HDFS)
Thursday, June 6, 13
25. Execution
• Project and select early in your workflow
• Store common intermediate datasets (in
CFS/HDFS)
• Bulk loader output format excels
Thursday, June 6, 13