Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration between Presto & Alluxio
Ke Wang, Software Engineer (Facebook)
Bin Fan, Founding Engineer, VP Of Open Source (Alluxio)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
7. • Overview
• Architecture and Problems
• Re-architecture and Solution
• Performance
• Alluxio Deep-dive
7
8. • Metadata cache at various levels
• schemas
• ACLs
• Partitions info
• HDFS
• File handle caching: avoid file open calls
• File stripe/footer caching: avoid multiple redundant RPC calls to HDFS
• File data caching: avoid network or HDFS latency.
• Compute
• Plan
• Partitial Result
Caching
8
9. • An optimization technique is to cache working dataset closer to the
compute node.
• Less trips to remote storage should help with latencies and IO.
Data Caching
9
13. • Facebook internal caching libraries
• Open source solutions
• Build our own
Various Options
13
14. • Naïve solution
• Copying files from remote storage on local storage
• Merging files in the local storage to keep file count low
File Merge Caching
14
16. • Segment Based data caching
• Pluggable eviction policies
• Configuration of various aspects like sizes, resources usage, eviction policies, etc.
•
• A Java based OSS library
• Provide detailed stats regarding cache usage.
• Caching should not become a point of failure.
• Asynchronous operations.
• Files management at the disk level.
• Flash throughput limiter to avoid endurance issues.
Learnings & Alluxio Collaboration
16
18. • Two full days worth of queries from the production cluster was shadowed
to the test cluster.
• Query Count: 17320
• 600 nodes cluster
• 460GB per node was configured for data caching.
• LRU eviction policy.
• 1MB as the block size, meaning data is read, stored, and evicted in the 1
MB size.
Benchmark Configuration
18
20. • Data Size read for master branch run: 582 T Bytes
• Data Size read for caching branch run: 251 T Bytes
• Savings in Scans: 57%
Benchmark Results
IO Savings
20
23. • Overview
• Architecture and Problems
• Re-architecture and Solutions
• Performance
• Alluxio Deep-dive
23
24. Alluxio Overview
Translate access to optimal storage APIs over a slow network
Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
24
25. Local cache
storage
Alluxio Caching
File System
On Cache Hit
External
Storage
Presto
Worker
On Cache Miss
HDFS API Calls
Alluxio Cache
Manager
External
File System
Presto Server JVM
Presto & Alluxio Local Cache
Architecture
25
26. • Cache files in fix-sized segments (called pages)
• configurable, 1MB by default
• Store pages off-heap
• avoid using JVM memory resource but with SSDs
• Highly-concurrent & thread-safe
• Light-weight & fine-grained locking
if cacheManager.hasPage(pageId):
page = cacheManager.readPage(pageId)
else:
readFromExternalFS(page, offset, len)
cacheManager.writePage(pageId, page)
Implementation & Optimization
26
27. • Pluggable cache replace policies:
• LRU, LFU
• Pluggable cache storage options:
• Local file system store: each page -> one file
• Rocksdb store: page -> one value associated with pageId
• Async cache writes
• to handle bursty cache write ops, queue writes in background
• Failure Recovery
• disks are expected to fail when running at Facebook scale
Implementation & Optimization
27
28. • (WIP) Support Schema/Table/Partition level Cache Quota
• (WIP) Performance optimizations for small files
• (Future work) Semantics-aware caching
Ongoing Development
28
29. • Edit etc/catalog/hive.properties
• More details in the blog
cache.enabled=true
cache.type=ALLUXIO
cache.base-directory=/tmp/alluxio-cache
cache.alluxio.max-cache-size=500GB
hive.node-selection-strategy=SOFT_AFFINITY
Enable Alluxio Local Cache w/ Presto
29
https://prestodb.io/blog/2020/06/16/alluxio-datacaching
30. • Fine-grained control on working set
• free / pin data in cache, set data TTL in cache etc
• Metadata caching and syncing
• Automatically sync data b/w Alluxio cache and persisted data
• Data Transformation Services
• e.g., convert csv files into parquet format in cache
• Data Migration services
• e.g., migrate from HDFS to S3 based on access time policy
• Familiar Filesystem CLIs
• e.g., alluxio fs ls /my/path
Alluxio File System Enhancements
30
Alluxio Doc: https://docs.alluxio.io/os/user/stable/en/Overview.html