This document summarizes Asya Kamsky's work measuring MongoDB's performance on the TPC-C benchmark. She discusses motivations for benchmarking, describes evolving the TPC-C methodology for MongoDB including data modeling, indexing, hardware selection and code optimizations. Results showed linear throughput scaling from 50-1000 warehouses on Google Cloud M-series machines. Next steps include publishing a paper on the work and making the GitHub repo public.
2. Asya Kamsky, MongoDB Maven
Measuring Transaction Performance
MongoDB Meets TPC-C
@asya999 #askAsya
Or how I learned to stop worrying and love the benchmarks.
50. Old NEW_ORDER:
for i in range(ol_cnt):
self.stock.update_one(si,
{"$set": {"S_QUANTITY": ...},
session=s)
self.order_line.insert_one(
ol, session=s)
TPC-C
New NEW_ORDER:
stockWrites = []
orderLineWrites = []
for i in range(ol_cnt):
stockWrites.append(
pymongo.UpdateOne(si,
{"$set": {"S_QUANTITY": ...}}))
orderLineWrites.append(ol)
self.order_line.insert_many(
orderLineWrites, session=s)
self.stock.bulk_write(
stockWrites, session=s)
53. Old Delivery :
Pick Warehouse:
start Transaction
for each of ten deliveries
do delivery
done
commit Transaction
done
TPC-C
New Delivery:
Pick Warehouse:
for each of ten deliveries
start Transaction
do delivery
commit Transaction
done
done
63. What's next?
Publish paper VLDB 2019
Make github repo public
Submit more changes upstream
Extend for sharded workload
Measure bigger scale
Adapt more traditional benchmarks?