This document compares MongoDB and ScyllaDB databases. It discusses their histories, architectures, data models, querying capabilities, consistency handling, and scaling approaches. It also provides takeaways for operations teams and developers, noting that ScyllaDB favors consistent performance over flexibility while MongoDB is more flexible but sacrifices some performance. The document also outlines how a company called Numberly uses both MongoDB and ScyllaDB for different use cases.
15. Ops take away
MongoDB claims their sharding based clustering is more flexible
■ TCO rating: bad
● More nodes needed = higher operational & maintenance costs
● Wasted resources
■ Ops rating: bad
● Background data (re)distribution impacts the runtime workload
■ Usage of active time window can help, but it feels like a patch
MongoDB replica-sets should be enough
■ TCO rating: moderate
● Vertical scaling is not a bad thing!
■ Ops rating: good
● Almost no tuning needed
● Easy automation: few configuration options
● Smooth rolling upgrades
16. Ops take away
Scylla consistent hashing based clustering is efficient
■ TCO rating: good
● Clean and simple topology
● Maximized hardware utilization
● Scales horizontally
■ Ops rating: moderate
● Moderate number of configuration options
● Automation could be better: per-node manual setup required
● Complex and mandatory background maintenance operations
■ Compactions (seamless)
■ Repairs (semi-seamless thanks to scylla-manager)
27. Dev take away
Scylla performance, latency and availability make a difference
■ Good that we now (or soon) have
● Efficient server-side filtering (3.0)
● Row-level repairs (3.1)
● A simple yet efficient LIKE operator (3.2)
● Lightweight transactions (LWT) (Beta)
● Change Data Capture (Beta)
● User Defined Functions
■ Still missing
● Intersection on indexes
● A good and shard-aware Python driver
● Search capabilities
29. Numberly use cases
MongoDB
■ Web backends
● REST APIs with possibly flexible schemas
■ Real-time queries over unpredictable behavioral data
● Web tracking data with random fields and values
Scylla
■ Real-time and latency sensitive data
● Data enrichment
■ Mixed batch and real-time workloads
● Data correlation and matching
■ Web backends
● GraphQL APIs with fixed schemas
2012: mongoDB switches to write concern = 1 by default
2017: mongoDB 3.6 change streams & schema validation
2017: scylla 2.0 counters production ready and MV beta
2018: mongoDB 4.0 ACID transactions
2019: scylla 3.0 MV, SI, hinted handoff production ready and full scan improvements
MongoDB and Scylla love XFS and share a “compaction” command
To reclaim disk space on MongoDB WiredTiger
To optimize disk I/O and disk space usage on Scylla
Security, RBAC=, TLS wire=, LDAP+encryption=enterprise (field encryption automated on mongo enterprise)
Node discovery
Mongo = raft
Scylla = gossip
Data availability
Mongo: #node / RS
Scylla: RF
Mongo: read scaling = add a secondary… but write scaling?
TCO
Mongo: data replication = by collection = number of nodes by replica-set * number of replica-sets (shards)
Mongo: data distribution = by shard key = ranged or hashed
Scylla: data replication = by keyspace = replication factor
Scylla: data distribution = by partition key = consistent hash ring
Mongo: sharding zones = sharding key = ranged sharding
Scylla: how to make sure data is located on specific zone then? Keyspace = data location?
Running on commodity hardware is a lure, don’t do it
Scale: but using big nodes is still better
JSON vs SQL rows
Mongo: flexible schema but can be validated/enforced with rich limitations
Scylla: fixed schema, must be altered
Mongo: cross-driver CRUD specification
Scylla java: shard aware
BSON format is painful on JVM ecosystem as it’s not native
Overall:
Mongo: Taillable cursors, retryable writes
Scylla: UDT but not yet LWT
Solution: indexes
Mongo: Rich indexes, intersection (fit in RAM recommended)
Scylla: global, local indexes, no intersection yet
Tunable consistency: Mongo write concern & read preference, Scylla Consistency Level
Mongo: Atomicity, Consistency, Isolation, Durability distributed transactions 4.2 cluster
Scylla: repairs make sure data replicas are consistent between nodes
Mongo: requires a field + index + background job
Scylla: creates tombstones + requires repairs
Mongo: The background task that removes expired documents runs every 60 seconds. As a result, documents may remain in a collection during the period between the expiration of the document and the running of the background task.
Scylla: whenever a ttl is expiring, it verifies for all other (non-primary) columns values, If all of them are null, the record is removed automatically.
Row-level repairs to maximize data consistency and lower background footprint
Scylla interest & adoption is growing while Mongo’s stale