MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Numberly

MongoDB vs ScyllaDB
by production experience
Alexys Jacob, CTO

Bonjour !
Alexys Jacob
Gentoo Linux developer
- dev-db / mongodb / redis / scylla
- sys-cluster / keepalived / ipvsadm / consul
- dev-python / pymongo ...
Open Source contributor
- MongoDB
- Scylla
- Apache Airflow
- Python Software Foundation contributing member
Scylla for kids advocate ;)
CTO at

Numberly
adoption
MongoDB
OSS
(AGPL)
MongoDB
1.4 stable
Cassandra
OSS
(Apache)
Cassandra
1.0
Cassandra
2.0
Cassandra
3.0
Scylla
OSS
(AGPL)
Scylla
1.0
Scylla
2.0
Scylla
3.0
2016 2017 2018 201920152013201120102008 2009
MongoDB
2.0
MongoDB
3.0
WiredTiger
(3.2)
MongoDB
4.0
(SSPL)
MongoDB
4.2
Numberly
adoption
MongoDB
3.6

C++ = C++
XFS = XFS
unbound-cores vs thread-per-core
Security ~ Security
Core

Replica-set Cluster
Terminology
vs

Replica-set
Roles: primary, secondaries
Cluster
Equal nodes
Topology
vs

Replica-set
Primary = read + write
Secondary = read only
Cluster
All nodes = read + write
CAP Theorem
C(onsistency) vs A(vailability)

Cluster of replica-sets (sharding)
+ Metadata replica-set
+ Smart Router instances
Scaling I/O
vs
Just add more nodes

Sharding zones
Sharding key = data location
EU
Multi-datacenter / region
Replication factor + strategy
= data location
ASIA ASIA
US
EU
US
vs

Ops take away
MongoDB claims their sharding based clustering is more flexible
■ TCO rating: bad
● More nodes needed = higher operational & maintenance costs
● Wasted resources
■ Ops rating: bad
● Background data (re)distribution impacts the runtime workload
■ Usage of active time window can help, but it feels like a patch
MongoDB replica-sets should be enough
■ TCO rating: moderate
● Vertical scaling is not a bad thing!
■ Ops rating: good
● Almost no tuning needed
● Easy automation: few configuration options
● Smooth rolling upgrades

Ops take away
Scylla consistent hashing based clustering is efficient
■ TCO rating: good
● Clean and simple topology
● Maximized hardware utilization
● Scales horizontally
■ Ops rating: moderate
● Moderate number of configuration options
● Automation could be better: per-node manual setup required
● Complex and mandatory background maintenance operations
■ Compactions (seamless)
■ Repairs (semi-seamless thanks to scylla-manager)

Document store vs Columnar store
Core
+ Geospatial queries
+ Text search
+ Aggregation pipelines
+ Graph queries
+ Change streams
+ User Defined Types
+ Counters
+ Lightweight Transactions*

Flexible collections vs Table schemas
Modeling

pymongo
Connecting
cassandra-driver
driver-reactivestreams
spark-connector
scylla java-driver
cassandra-connector
mongodb-go-driver scylla gocqlx

Querying
Frictionless, rich
queries… with hidden
discomfiture
Performance
guaranteed queries…
Frictionless, rich
queries… with hidden
discomfiture
Performance
guaranteed queries…
with limitations

Tunable by the client
Handling consistency
Write concern
+ ACID Transactions
+ Causal consistency
Consistency Level
+ Materialized Views
+ Logged batches
Do not forget repairs!

Expiring data
Index based
Whole document
Metadata based
Column or Row
Free write timestamps!

MongoDB favors flexibility
to performance

Scylla favors consistent
performance to versatility

Dev take away
Scylla performance, latency and availability make a difference
■ Good that we now (or soon) have
● Efficient server-side filtering (3.0)
● Row-level repairs (3.1)
● A simple yet efficient LIKE operator (3.2)
● Lightweight transactions (LWT) (Beta)
● Change Data Capture (Beta)
● User Defined Functions
■ Still missing
● Intersection on indexes
● A good and shard-aware Python driver
● Search capabilities

Numberly use cases
MongoDB
■ Web backends
● REST APIs with possibly flexible schemas
■ Real-time queries over unpredictable behavioral data
● Web tracking data with random fields and values
Scylla
■ Real-time and latency sensitive data
● Data enrichment
■ Mixed batch and real-time workloads
● Data correlation and matching
■ Web backends
● GraphQL APIs with fixed schemas

Stay in touch
Alexys Jacob
alexys@numberly.com
ultrabug

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Numberly

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Numberly

Similar a MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Numberly (20)

Más de ScyllaDB

Más de ScyllaDB (20)

Último

Último (20)

MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Numberly

Notas del editor