Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

From Postgres to ScyllaDB: Migration Strategies and Performance Gains

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 23 Anuncio

Más Contenido Relacionado

Similares a From Postgres to ScyllaDB: Migration Strategies and Performance Gains (20)

Más de ScyllaDB (20)

Anuncio

Más reciente (20)

From Postgres to ScyllaDB: Migration Strategies and Performance Gains

  1. 1. From Postgres to ScyllaDB: Migration Strategies and Performance Gains Dan Harris & Sebastian Vercruysse
  2. 2. Company & Presenters Dan Harris Sebastian Vercruysse
  3. 3. ■ DataPrime Query Engine and metastore ■ ScyllaDB implementation ■ Conclusion Agenda
  4. 4. DataPrime Query Engine
  5. 5. Dataprime Query Engine ■ Custom distributed query engine for proprietary query language (DataPrime) on arbitrary semi-structured data ■ Querying data stored in object storage ■ Storage format is specialized parquet files
  6. 6. ■ Reading parquet metadata from object storage is too expensive for large queries ■ Move metadata into separate (faster) storage ■ Block listing with bloom filters ■ Transactional commit log Metastore: Motivation
  7. 7. Requirements for metastore ■ Low latency ■ Scaleable ■ Transactional guarantees Initial implementation: postgres Metastore: Requirements Example one (large) customer: ■ 2k parquet files / hour ■ 50k parquet files / day ■ 15 TB of data / day ■ 20 GB of metadata / day
  8. 8. Important for listing: give me all the blocks for a table in a given time range Example: ■ Block url: s3://cgx-production-c4c-archive-data/cx/parquet/v1/team_id=555585/… …dt=2022-12-02/hr=10/0246f9e9-f0da-4723-9b64-a12346095d25.parquet ■ Row group: 0, 1, 2 … ■ Min timestamp ■ Max timestamp ■ Number of rows ■ Total size ■ … Blocks
  9. 9. Bloom Filters Used for pruning blocks when filtering by search term ■ is a given token maybe in this block or definitely not? Works by hashing tokens multiple times & setting bits to 1. When checking, hash again and check bits are all 1. Specifically using blocked bloom filter (sequence of bloom filters): 8192 * 32 bytes
  10. 10. Column Metadata Per-column parquet metadata required for scanning and decoding parquet file Example: ■ Block URL ■ Row Group ■ Column Name ■ Column metadata (blob)
  11. 11. ScyllaDB Implementation
  12. 12. Blocks Example: ■ s3://cgx-production-c4c-archive-data/cx/parquet/v1/team_id=555585/… …dt=2022-12-02/hr=10/0246f9e9-f0da-4723-9b64-a12346095d25.parquet What should the primary key be? ■ Table url? ■ ((Block url, row group))? ■ ((Table url, hour))? ■ ((Table url, hour), block url, row group)
  13. 13. Bloom Filters Problem: how to verify bits are set? Solution: read bloom filters and process in application Problem: ~50k blocks/day * 262kB = ~12GB of data, too much for one query Solution: ■ chunk bloom filters and split into rows ■ by chunking per bloom filter block we read one row per token, 50k * 32 bytes / token = 1.6MB / token
  14. 14. Bloom Filters: Primary Key (1) Primary key: ((block_url, row_group), chunk index) ~ 8192 chunks of 32 bytes per bloom filter = ~262kB per partition Pros: ■ Easy to insert and delete, single batch query Cons: ■ Need to know the block id before reading ■ A lot of partitions to access, 1 day: 50k partitions
  15. 15. Bloom Filters: Primary Key (2) Primary key: ((table url, hour, chunk index), block url, row group) ~ 2000 chunks of 32 bytes per bloom filter = ~64kB per partition Pros ■ Very fast listing, less partitions. 1 day, 5 tokens: 24 * 5 = 120 partitions ■ No dependency on block, can read in parallel Cons ■ Expensive to insert and delete: 8192 partitions for a single block!
  16. 16. Bloom Filters: Future Approach Investigate optimal chunking: find middle ground between writing large enough chunks and reading unnecessary data Can we use UDF’s with WebAssembly? SELECT block_url, row_group FROM bloom_filters WHERE table_url = ? AND hour = ? AND bloom_filter_matches(bloom_filter, indexes) ■ Let ScyllaDB do the hard work ■ Don’t need to worry about amount of data we’re sending back to app ■ Code is already written in rust
  17. 17. Be Careful It’s very much not SQL - try to avoid migrations (/bugs) Solutions: ■ Rename columns? ■ Add new columns, UPDATE blocks SET query? ■ Truncate table and start over again
  18. 18. ScyllaDB: Ecosystem Extensive usage of ScyllaDB libraries and components: ■ Written in rust on top of ScyllaDB-rust-driver ■ ScyllaDB Operator for k8s ■ ScyllaDB Monitoring ■ ScyllaDB Manager From knowing ScyllaDB exists to production ready & terabytes of data in 2 months
  19. 19. Hardware Cost is very important 3-node cluster: ■ 8 vCPU ■ 32 GiB memory ■ ARM/Graviton ■ EBS volumes (gp3) ■ 500 MBps bandwidth ■ 12k IOPS
  20. 20. Metastore: Block Listing Largest cluster: 4-5 TB on each node, mostly for one customer Writes: ■ p99 latency: <1ms ■ ~10k writes / s Block listing: ■ Depends on query & whether we’re using bloom filters ■ for 1 hour: <20ms latency ■ for 1 day: <500ms latency
  21. 21. Metastore: Column Metadata Reads: ■ p50 latency: 5ms ■ p99 latency: 100ms (when we timeout) Issue: large amount of concurrent queries Probably disk issue
  22. 22. Conclusion ■ Keep an eye on partition sizes ■ Think about read/write patterns ■ Very happy with block listing… … but unpredictable tail latency for reading column metadata ■ Probably shouldn’t use EBS :-)
  23. 23. Thank You Stay in Touch Dan Harris dan@coralogix.com @thinkharderdev github.com/thinkharderdev www.linkedin.com/in/dsh2va github.com/sebver sebastian.vercruysse@coralogix.com www.linkedin.com/in/sebastian-vercruysse Sebastian Vercruysse

×