SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Accelerating Data
Computation on Ceph Objects
Leonardo Militano
milt@zhaw.ch
Alluxio Online Meetup - 10.11.2020
Agenda
● Introduction to Cloud Storage
● Solutions for data analytics based on data locality
● Alluxio based solution for data analytics
● Performance evaluation
● Conclusions
Service Engineering group
● The SE group at InIT, Zurich University of Applied Sciences
(ZHAW), Switzerland
● Core expertise: IaaS, PaaS, SaaS, virtualization
● Focus is on scalable and reliable implementation of
IT-based services
● Research Initiatives:
○ Cloud (infrastructure, platform, CI/CD, DevOps, CNA)
○ Robotics (cloud robotics, ROS)
● Blog: https://blog.zhaw.ch/icclab/
ICCLab’s research approach
strategic research agenda
core research expertise
● The global storage market has an annual growth of
25.8% and it is predicted to reach $74.94 billion of
value in 2021
● Increasing demand for data storage:
○ IDC expects data to grow 61% to 163 ZB by 2025
○ By 2025, 49 percent of data will be stored in public cloud
environments
● At the same there is a paradigm shift with more data created,
stored and processed at the edge
● Data is the new oil!
Storage in the Cloud
Data analytics
● If data is the new oil, it needs to be processed into higher-order
products to benefit from its value
● Disaggregation of storage and compute for cost efficiency and
manageability is the common approach
○ Data is remote to the compute nodes
● Bringing the code to the data (e.g., computational storage) or
bringing the data close to the code (e.g., in-memory
computation)?
○ Data locality for bandwidth, power consumption, cost, latency, and security
Ceph storage
● Ceph is a unified, distributed storage system
with self-management and healing features
for: Object Storage, Block Storage and File
Storage
● We performed some Experiments on Ceph
Object Classes for Active Storage showing
great time savings using object classes
Alluxio for Memory Speed Computation
● Alluxio on the compute nodes allows for in-memory computation and fast data
analysis
Source: alluxio.io
The framework used for testing
● Ceph (version mimic) storage cluster
○ 6 OpenStack VMs: 1 Ceph monitor, 3 OSDs,
1 RGW, 1 Admin node
● Total storage size of 420GiB over 7
OSD volumes
● Alluxio cluster (v2.3 and v2.4)
● Spark (v3.0.0)
● Scala application on Spark
● Find more details on our blog post
Two compute cluster configurations
● Single-node:
○ One VM (16vCPUs) for Alluxio and Spark with 40GB of
memory for the worker node
● Cluster-mode:
○ Two Spark/Alluxio worker nodes (16vCPUs, 40GB memory)
● Scala application over Spark
○ repeated access to a text file
○ count operation over the lines in the file
● A comparison was performed in terms of overall
execution time for different file sizes:
o Alluxio-based vs. direct Ceph access
Single VM setup results
Cluster setup results
Summary of results
● Single-node setup:
○ The second time the file is accessed directly on Ceph it takes 75
times more for the 1GB file, 111 and 107 times more for the 5GB
and 10GB file w.r.t. the access over Alluxio
● Cluster-mode setup:
○ The second time the file is accessed directly on Ceph it takes 35
times more for the 1GB file, 57 and 65 times more for the 5GB and
10GB file w.r.t. the access over Alluxio
● NB! Results were obtained using Java version 8 (prerequisite of
Alluxio v2.3)
o Direct Ceph file access with Spark using Java 11 performs much better when
compared to using Java 8!
Testing Alluxio 2.4
Testing Alluxio 2.4
● The benefits are downscaled by the general reduced execution time using Java 11
● Anyhow still a 6 times better performance is obtained for a 10GB file at the second
access compared to direct Ceph access
● So Alluxio 2.4 resolves an important limitation of previous versions
Conclusions
● Alluxio enables memory-speed data access by eliminating
remote data reads for repeated accesses
● Our results show how both single-node and cluster-mode
setups lead to several orders of improvement
● Alluxio 2.3 had Java version 8 as a prerequisite (default
Java version is Java 11), which was a limiting factor
● Alluxio 2.4 supporting Java 11 is fundamental to keep the
performance improvements w.r.t. direct backend storage
access
Q&A
Leonardo Militano
milt@zhaw.ch
Alluxio Online Meetup - 10.11.2020

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Alluxio Architecture and Performance
Alluxio Architecture and PerformanceAlluxio Architecture and Performance
Alluxio Architecture and Performance
 
Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
 
Burst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copiesBurst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copies
 
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for SparkAlluxio on AWS EMR Fast Storage Access & Sharing for Spark
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Fluid: When Alluxio Meets Kubernetes
Fluid: When Alluxio Meets KubernetesFluid: When Alluxio Meets Kubernetes
Fluid: When Alluxio Meets Kubernetes
 

Similar a Accelerating Data Computation on Ceph Objects

Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
JayjeetChakraborty
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
Timothy Spann
 

Similar a Accelerating Data Computation on Ceph Objects (20)

Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph Storage
 
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
Skyhook: Towards an Arrow-Native Storage System, CCGrid 2022
 
SC23 : NCHC Hyper Kylin Cloud Platform
SC23 : NCHC Hyper Kylin Cloud PlatformSC23 : NCHC Hyper Kylin Cloud Platform
SC23 : NCHC Hyper Kylin Cloud Platform
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015OGF Introductory Overview - OGF 44 at EGI Conference 2015
OGF Introductory Overview - OGF 44 at EGI Conference 2015
 
HPC on OpenStack
HPC on OpenStackHPC on OpenStack
HPC on OpenStack
 
Leveraging Alluxio with Spark SQL to Speed Up Ad-hoc Analysis
Leveraging Alluxio with Spark SQL to Speed Up Ad-hoc AnalysisLeveraging Alluxio with Spark SQL to Speed Up Ad-hoc Analysis
Leveraging Alluxio with Spark SQL to Speed Up Ad-hoc Analysis
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9
 
Workday's Next Generation Private Cloud
Workday's Next Generation Private CloudWorkday's Next Generation Private Cloud
Workday's Next Generation Private Cloud
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
 
OpenStack Toronto UG - MeetUp - October 2018
OpenStack Toronto UG - MeetUp - October 2018OpenStack Toronto UG - MeetUp - October 2018
OpenStack Toronto UG - MeetUp - October 2018
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 

Más de Alluxio, Inc.

Más de Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Último

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Último (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Accelerating Data Computation on Ceph Objects

  • 1. Accelerating Data Computation on Ceph Objects Leonardo Militano milt@zhaw.ch Alluxio Online Meetup - 10.11.2020
  • 2. Agenda ● Introduction to Cloud Storage ● Solutions for data analytics based on data locality ● Alluxio based solution for data analytics ● Performance evaluation ● Conclusions
  • 3. Service Engineering group ● The SE group at InIT, Zurich University of Applied Sciences (ZHAW), Switzerland ● Core expertise: IaaS, PaaS, SaaS, virtualization ● Focus is on scalable and reliable implementation of IT-based services ● Research Initiatives: ○ Cloud (infrastructure, platform, CI/CD, DevOps, CNA) ○ Robotics (cloud robotics, ROS) ● Blog: https://blog.zhaw.ch/icclab/
  • 4. ICCLab’s research approach strategic research agenda core research expertise
  • 5. ● The global storage market has an annual growth of 25.8% and it is predicted to reach $74.94 billion of value in 2021 ● Increasing demand for data storage: ○ IDC expects data to grow 61% to 163 ZB by 2025 ○ By 2025, 49 percent of data will be stored in public cloud environments ● At the same there is a paradigm shift with more data created, stored and processed at the edge ● Data is the new oil! Storage in the Cloud
  • 6. Data analytics ● If data is the new oil, it needs to be processed into higher-order products to benefit from its value ● Disaggregation of storage and compute for cost efficiency and manageability is the common approach ○ Data is remote to the compute nodes ● Bringing the code to the data (e.g., computational storage) or bringing the data close to the code (e.g., in-memory computation)? ○ Data locality for bandwidth, power consumption, cost, latency, and security
  • 7. Ceph storage ● Ceph is a unified, distributed storage system with self-management and healing features for: Object Storage, Block Storage and File Storage ● We performed some Experiments on Ceph Object Classes for Active Storage showing great time savings using object classes
  • 8. Alluxio for Memory Speed Computation ● Alluxio on the compute nodes allows for in-memory computation and fast data analysis Source: alluxio.io
  • 9. The framework used for testing ● Ceph (version mimic) storage cluster ○ 6 OpenStack VMs: 1 Ceph monitor, 3 OSDs, 1 RGW, 1 Admin node ● Total storage size of 420GiB over 7 OSD volumes ● Alluxio cluster (v2.3 and v2.4) ● Spark (v3.0.0) ● Scala application on Spark ● Find more details on our blog post
  • 10. Two compute cluster configurations ● Single-node: ○ One VM (16vCPUs) for Alluxio and Spark with 40GB of memory for the worker node ● Cluster-mode: ○ Two Spark/Alluxio worker nodes (16vCPUs, 40GB memory) ● Scala application over Spark ○ repeated access to a text file ○ count operation over the lines in the file ● A comparison was performed in terms of overall execution time for different file sizes: o Alluxio-based vs. direct Ceph access
  • 11. Single VM setup results
  • 13. Summary of results ● Single-node setup: ○ The second time the file is accessed directly on Ceph it takes 75 times more for the 1GB file, 111 and 107 times more for the 5GB and 10GB file w.r.t. the access over Alluxio ● Cluster-mode setup: ○ The second time the file is accessed directly on Ceph it takes 35 times more for the 1GB file, 57 and 65 times more for the 5GB and 10GB file w.r.t. the access over Alluxio ● NB! Results were obtained using Java version 8 (prerequisite of Alluxio v2.3) o Direct Ceph file access with Spark using Java 11 performs much better when compared to using Java 8!
  • 15. Testing Alluxio 2.4 ● The benefits are downscaled by the general reduced execution time using Java 11 ● Anyhow still a 6 times better performance is obtained for a 10GB file at the second access compared to direct Ceph access ● So Alluxio 2.4 resolves an important limitation of previous versions
  • 16. Conclusions ● Alluxio enables memory-speed data access by eliminating remote data reads for repeated accesses ● Our results show how both single-node and cluster-mode setups lead to several orders of improvement ● Alluxio 2.3 had Java version 8 as a prerequisite (default Java version is Java 11), which was a limiting factor ● Alluxio 2.4 supporting Java 11 is fundamental to keep the performance improvements w.r.t. direct backend storage access