Whether running load tests or migrating historic data, loading data directly into Cassandra can be very useful to bypass the system’s write path.
In this webinar, we will look at how data is stored on disk in sstables, how to generate these structures directly, and how to load this data rapidly into your cluster using sstableloader. We'll also review different use cases for when you should and shouldn't use this method.
2. About Us
•
Work with clients to deliver and improve
Apache Cassandra services
•
Apache Cassandra committer, Datastax
MVP, Hector maintainer, Apache Usergrid
committer
•
Based in New Zealand & USA
3. Why is bulk loading useful?
•
Performance tests
4. Why is bulk loading useful?
•
Performance tests
•
Migrating historical data
5. Why is bulk loading useful?
•
Performance tests
•
Migrating historical data
•
Changing topologies
6. !
•
How Data is Stored
•
Case Studies
- Generating Dummy Data
- Backfilling Historical Data
- Changing Topologies
•
Conclusion
18. !
•
How Data is Stored
•
Case Studies
- Generating Dummy Data
- Backfilling Historical Data
- Changing Topologies
•
Conclusion
19. create keyspace test
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = {replication_factor:1};
!
create column family test
with comparator = 'AsciiType'
and default_validation_class = 'AsciiType'
and key_validation_class = 'AsciiType';
Set up keyspace and column family
20. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(
directory,
partitioner,
keyspace,
columnFamily,
AsciiType.instance,
null, // subcomparator for super columns
size_per_sstable_mb
);
SStableGen.java
21. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(
directory,
partitioner,
keyspace,
columnFamily,
AsciiType.instance,
null, // subcomparator for super columns
size_per_sstable_mb
);
SStableGen.java
22. AbstractSSTableSimpleWriter writer = new SSTableSimpleUnsortedWriter(
directory,
partitioner,
keyspace,
columnFamily,
AsciiType.instance,
null, // subcomparator for super columns
size_per_sstable_mb
);
SStableGen.java
32. !
•
How Data is Stored
•
Case Studies
- Generating Dummy Data
- Backfilling Historical Data
- Changing Topologies
•
Conclusion
33. // list of orders by user
customerOrders = new SSTableSimpleUnsortedWriter(…);
// orders by order id
orders = new SSTableSimpleUnsortedWriter(…);
!
// assume orders are in date order
for (Order order : oldOrders) {
customerOrders.newRow(ByteBufferUtil.bytes(order.customerId));
customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER,
timestamp);
!
orders.newRow(ByteBufferUtil.bytes(order.userId));
orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId),
timestamp);
orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp);
orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp);
}
!
customerOrders.close()
orders.close()
34. // list of orders by user
customerOrders = new SSTableSimpleUnsortedWriter(…);
// orders by order id
orders = new SSTableSimpleUnsortedWriter(…);
!
// assume orders are in date order
for (Order order : oldOrders) {
customerOrders.newRow(ByteBufferUtil.bytes(order.customerId));
customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER,
timestamp);
!
orders.newRow(ByteBufferUtil.bytes(order.userId));
orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId),
timestamp);
orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp);
orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp);
}
!
customerOrders.close()
orders.close()
35. // list of orders by user
customerOrders = new SSTableSimpleUnsortedWriter(…);
// orders by order id
orders = new SSTableSimpleUnsortedWriter(…);
!
// assume orders are in date order
for (Order order : oldOrders) {
customerOrders.newRow(ByteBufferUtil.bytes(order.customerId));
customerOrders.addColumn(ByteBufferUtil.bytes(order.orderId), ByBufferUtil.EMPTY_BYTE_BUFFER,
timestamp);
!
orders.newRow(ByteBufferUtil.bytes(order.userId));
orders.addColumn(ByteBufferUtil.bytes(“customer_id), ByteBufferUtil.bytes(order.customerId),
timestamp);
orders.addColumn(ByteBufferUtil.bytes(“date), ByteBufferUtil.bytes(order.date), timestamp);
orders.addColumn(ByteBufferUtil.bytes(“total), ByteBufferUtil.bytes(order.total), timestamp);
}
!
customerOrders.close()
orders.close()
36. !
•
How Data is Stored
•
Case Studies
- Generating Dummy Data
- Backfilling Historical Data
- Changing Topologies
•
Conclusion