SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Alluxio Office Hour:
Alluxio Structured Data Management
Gene Pang | Alluxio
Motivation
Alluxio Structured Data Management
Developer Preview in Alluxio 2.1
Demo
Conclusion
Outline
2
Motivation
Common Alluxio Use Cases
4
…
…
Unified Interface
Unified Namespace
Caching and Locality
SQL Engines are popular
5
Storage Systems SQL Frameworks
Files/Objects
Directories
Raw Bytes
Storage
Optimized
Tables
Schemas
Rows/Columns
Compute
Optimized
Impedance Mismatch
Further Expand Benefits!
Benefits of Alluxio Data Orchestration
6
Storage
Systems
SQL
Frameworks
Caching
Unified Interface/Namespace
Schema-Aware Optimizations
Compute-Optimized Formats
Physical Data Independence
Alluxio
Structured Data Management
Integrating SQL Engines with Alluxio
Provide Structured Data APIs
Focus on how frameworks interact with data
High-Level Philosophy
8
Cache Logical Data Access
Focus on caching what frameworks want
Alluxio Structured Data Management
Alluxio Structured Data Management
9
Storage
System
Transformation
Service
Structured Data
and Metadata
Logical Data
Access Layer
Structured
Data Client
SQL
Engine
Engine
Developer Preview in Alluxio 2.1
Try out initial components!
Target Environment
11
Presto
Hive
Connector
Hive
Metastore
Storage
Alluxio Structured Data Management
12
Presto
Alluxio Caching
Service
Alluxio Catalog
Service
AlluxioTransformation
Service
Hive
Connector
Alluxio
Connector
Hive
Metastore
Storage
Alluxio Catalog Service
13
Alluxio Catalog Service
Hive Metastore
Hive Under Database
Functionality
Manages metadata for structured data
Abstracts other database catalogs as
Under Database (UDB)
Benefits
Schema-aware optimizations
Simple deployment
Tighter integration with Presto
New plugin based on the Presto Hive connector
Available in Alluxio 2.1 distribution
In Progress: Merging connector into Presto codebase
Alluxio Presto Connector
14
Transformation Service
15
Transform data to be compute-optimized
independent from storage-optimized format
Coalesce Format Conversion
parquetcsv
Demo!
2 isolated AWS 10-node clusters
Presto + Hive Metastore + S3 Data
Presto + Alluxio + Hive Metastore + S3 Data
TPCDS sample dataset on S3
~10,000 CSV files
Demo
17
Attached existing Hive database into Alluxio Catalog
Alluxio Catalog served table metadata for Presto
Transformed store_sales by coalescing and converting CSV to Parquet
Demo Summary
18
Presto Without
Alluxio
20s
Alluxio
Transformations
7s
AlluxioTransformations
With Caching
3s
Conclusion
User community feedback/collaboration is important!
Future projects
New UDB implementations (AWS Glue)
More conversion formats (json)
DDL/DML workloads (CREATETABLE, INSERT, etc.)
New Client APIs for structured data (Arrow)
Future Work
20
Try it out!
Documentation
Provide feedback
Feature requests and issues in Github Alluxio/alluxio
Developer Preview Available in Alluxio 2.1
21
ThankYou!
22

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Embracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsEmbracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloads
 
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in KubernetesDeep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
 

Similar a Hands-on with Alluxio Structured Data Management

SharePoint 2010 High Availability - TechEd Brasil 2010
SharePoint 2010 High Availability - TechEd Brasil 2010SharePoint 2010 High Availability - TechEd Brasil 2010
SharePoint 2010 High Availability - TechEd Brasil 2010
Michael Noel
 
More Best Practices With Share Point Solutions
More Best Practices With Share Point SolutionsMore Best Practices With Share Point Solutions
More Best Practices With Share Point Solutions
Alexander Meijers
 
MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1
Information Technology
 
AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...
AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...
AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...
Michael Noel
 

Similar a Hands-on with Alluxio Structured Data Management (20)

Alluxio Innovations for Structured Data
Alluxio Innovations for Structured DataAlluxio Innovations for Structured Data
Alluxio Innovations for Structured Data
 
Building the Perfect SharePoint 2010 Farm
Building the Perfect SharePoint 2010 FarmBuilding the Perfect SharePoint 2010 Farm
Building the Perfect SharePoint 2010 Farm
 
FHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhirFHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhir
 
SharePoint 2010 High Availability - TechEd Brasil 2010
SharePoint 2010 High Availability - TechEd Brasil 2010SharePoint 2010 High Availability - TechEd Brasil 2010
SharePoint 2010 High Availability - TechEd Brasil 2010
 
Flask SQLite .pdf
Flask SQLite .pdfFlask SQLite .pdf
Flask SQLite .pdf
 
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePointINFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
 
More Best Practices With Share Point Solutions
More Best Practices With Share Point SolutionsMore Best Practices With Share Point Solutions
More Best Practices With Share Point Solutions
 
MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1MOSS 2007 Deployment Fundamentals -Part1
MOSS 2007 Deployment Fundamentals -Part1
 
Tech Ed Africa Demystifying Backup Restore In Share Point 2007
Tech Ed Africa Demystifying Backup Restore In Share Point 2007Tech Ed Africa Demystifying Backup Restore In Share Point 2007
Tech Ed Africa Demystifying Backup Restore In Share Point 2007
 
Techedafricademystifyingbackuprestoreinsharepoint2007 090805103250 Phpapp02
Techedafricademystifyingbackuprestoreinsharepoint2007 090805103250 Phpapp02Techedafricademystifyingbackuprestoreinsharepoint2007 090805103250 Phpapp02
Techedafricademystifyingbackuprestoreinsharepoint2007 090805103250 Phpapp02
 
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory SpeedAlluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory Speed
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
 
Jonathan Ralton - Trusting Your KM & ECM Strategy To SharePoint
Jonathan Ralton - Trusting Your KM & ECM Strategy To SharePointJonathan Ralton - Trusting Your KM & ECM Strategy To SharePoint
Jonathan Ralton - Trusting Your KM & ECM Strategy To SharePoint
 
AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...
AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...
AUDWC 2016 - Using SQL Server 20146 AlwaysOn Availability Groups for SharePoi...
 
SPSPTCDC - SharePoint Admin 101 - SpeedMetal - PowerUser to Admin in 75 Minutes
SPSPTCDC - SharePoint Admin 101 - SpeedMetal - PowerUser to Admin in 75 MinutesSPSPTCDC - SharePoint Admin 101 - SpeedMetal - PowerUser to Admin in 75 Minutes
SPSPTCDC - SharePoint Admin 101 - SpeedMetal - PowerUser to Admin in 75 Minutes
 
I/O & virtualization performance with a search engine based on an xml databa...
 I/O & virtualization performance with a search engine based on an xml databa... I/O & virtualization performance with a search engine based on an xml databa...
I/O & virtualization performance with a search engine based on an xml databa...
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 

Más de Alluxio, Inc.

Más de Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Último

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Último (20)

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Hands-on with Alluxio Structured Data Management

  • 1. Alluxio Office Hour: Alluxio Structured Data Management Gene Pang | Alluxio
  • 2. Motivation Alluxio Structured Data Management Developer Preview in Alluxio 2.1 Demo Conclusion Outline 2
  • 4. Common Alluxio Use Cases 4 … … Unified Interface Unified Namespace Caching and Locality SQL Engines are popular
  • 5. 5 Storage Systems SQL Frameworks Files/Objects Directories Raw Bytes Storage Optimized Tables Schemas Rows/Columns Compute Optimized Impedance Mismatch Further Expand Benefits!
  • 6. Benefits of Alluxio Data Orchestration 6 Storage Systems SQL Frameworks Caching Unified Interface/Namespace Schema-Aware Optimizations Compute-Optimized Formats Physical Data Independence
  • 8. Provide Structured Data APIs Focus on how frameworks interact with data High-Level Philosophy 8 Cache Logical Data Access Focus on caching what frameworks want
  • 9. Alluxio Structured Data Management Alluxio Structured Data Management 9 Storage System Transformation Service Structured Data and Metadata Logical Data Access Layer Structured Data Client SQL Engine Engine
  • 10. Developer Preview in Alluxio 2.1 Try out initial components!
  • 12. Alluxio Structured Data Management 12 Presto Alluxio Caching Service Alluxio Catalog Service AlluxioTransformation Service Hive Connector Alluxio Connector Hive Metastore Storage
  • 13. Alluxio Catalog Service 13 Alluxio Catalog Service Hive Metastore Hive Under Database Functionality Manages metadata for structured data Abstracts other database catalogs as Under Database (UDB) Benefits Schema-aware optimizations Simple deployment
  • 14. Tighter integration with Presto New plugin based on the Presto Hive connector Available in Alluxio 2.1 distribution In Progress: Merging connector into Presto codebase Alluxio Presto Connector 14
  • 15. Transformation Service 15 Transform data to be compute-optimized independent from storage-optimized format Coalesce Format Conversion parquetcsv
  • 16. Demo!
  • 17. 2 isolated AWS 10-node clusters Presto + Hive Metastore + S3 Data Presto + Alluxio + Hive Metastore + S3 Data TPCDS sample dataset on S3 ~10,000 CSV files Demo 17
  • 18. Attached existing Hive database into Alluxio Catalog Alluxio Catalog served table metadata for Presto Transformed store_sales by coalescing and converting CSV to Parquet Demo Summary 18 Presto Without Alluxio 20s Alluxio Transformations 7s AlluxioTransformations With Caching 3s
  • 20. User community feedback/collaboration is important! Future projects New UDB implementations (AWS Glue) More conversion formats (json) DDL/DML workloads (CREATETABLE, INSERT, etc.) New Client APIs for structured data (Arrow) Future Work 20
  • 21. Try it out! Documentation Provide feedback Feature requests and issues in Github Alluxio/alluxio Developer Preview Available in Alluxio 2.1 21 ThankYou!
  • 22. 22