Enviar búsqueda
Cargar
Best Practices for Using Alluxio with Spark
•
2 recomendaciones
•
1,317 vistas
Alluxio, Inc.
Seguir
Strata New York September 2017 Haoyuan Li, Ancil McBarnett
Leer menos
Leer más
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 41
Descargar ahora
Descargar para leer sin conexión
Recomendados
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Alluxio, Inc.
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Alluxio, Inc.
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Alluxio, Inc.
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio, Inc.
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Alluxio, Inc.
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio, Inc.
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Alluxio, Inc.
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio, Inc.
Recomendados
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Alluxio, Inc.
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Alluxio, Inc.
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Alluxio, Inc.
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio, Inc.
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Alluxio, Inc.
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio, Inc.
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Alluxio, Inc.
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio, Inc.
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio, Inc.
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with Alluxio
Alluxio, Inc.
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Spark Summit
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio, Inc.
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio, Inc.
Open Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed Storage
Alluxio, Inc.
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio, Inc.
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxio
thelabdude
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
Alluxio, Inc.
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio, Inc.
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
Alluxio, Inc.
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio, Inc.
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio, Inc.
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio, Inc.
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio, Inc.
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Alluxio, Inc.
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
Jiří Šimša
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
Alluxio, Inc.
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Data Con LA
Más contenido relacionado
La actualidad más candente
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio, Inc.
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with Alluxio
Alluxio, Inc.
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Spark Summit
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio, Inc.
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio, Inc.
Open Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed Storage
Alluxio, Inc.
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio, Inc.
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxio
thelabdude
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
Alluxio, Inc.
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio, Inc.
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio, Inc.
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
Alluxio, Inc.
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio, Inc.
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio, Inc.
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio, Inc.
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio, Inc.
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Alluxio, Inc.
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
Jiří Šimša
La actualidad más candente
(20)
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with Alluxio
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Open Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxio
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
Similar a Best Practices for Using Alluxio with Spark
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
Alluxio, Inc.
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Data Con LA
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Spark Summit
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Databricks
Data EcoSystem 2.0
Data EcoSystem 2.0
Alluxio, Inc.
Unify Data at Memory Speed
Unify Data at Memory Speed
Alluxio, Inc.
Accelerating Spark Workloads in an Apache Mesos Environment with Alluxio
Accelerating Spark Workloads in an Apache Mesos Environment with Alluxio
Alluxio, Inc.
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio, Inc.
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory Speed
Alluxio, Inc.
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
Alluxio, Inc.
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Alluxio, Inc.
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Alluxio, Inc.
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Alluxio, Inc.
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio, Inc.
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio, Inc.
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
Alluxio, Inc.
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
Similar a Best Practices for Using Alluxio with Spark
(20)
The Architecture of Decoupling Compute and Storage with Alluxio
The Architecture of Decoupling Compute and Storage with Alluxio
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Spark Pipelines in the Cloud with Alluxio by Bin Fan
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Data EcoSystem 2.0
Data EcoSystem 2.0
Unify Data at Memory Speed
Unify Data at Memory Speed
Accelerating Spark Workloads in an Apache Mesos Environment with Alluxio
Accelerating Spark Workloads in an Apache Mesos Environment with Alluxio
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio: Unify Data at Memory Speed
Alluxio: Unify Data at Memory Speed
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Más de Alluxio, Inc.
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
Alluxio, Inc.
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
Alluxio, Inc.
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
Alluxio, Inc.
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
Alluxio, Inc.
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio, Inc.
Más de Alluxio, Inc.
(20)
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Último
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Principled Technologies
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
apidays
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Roshan Dwivedi
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
The Digital Insurer
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Último
(20)
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Best Practices for Using Alluxio with Spark
1.
Best Practices for
Using Alluxio with Spark Haoyuan Li, Ancil McBarnett Strata NewYork, Sept 2017
2.
Confidential © Alluxio,
Inc.All Rights Reserved. 2 Outline Alluxio Overview Alluxio + Spark Use Cases Using Spark with Alluxio Performance Evaluation Demo 1 2 3 4 5
3.
Confidential © Alluxio,
Inc.All Rights Reserved. 3 Data EcosystemYesterday 3 • One Compute Framework • Single Storage System • Co-located
4.
Confidential © Alluxio,
Inc.All Rights Reserved. 4 Data Ecosystem Today • Many Compute Frameworks • Multiple Storage Systems • Most not co-located
5.
Confidential © Alluxio,
Inc.All Rights Reserved. 5 Data Ecosystem Issues 5 • Each application manage multiple data sources • Add/Removing data sources require application changes • Storage optimizations requires application change • Lower performance due to lack of locality
6.
Confidential © Alluxio,
Inc.All Rights Reserved. 6 Data Ecosystem Challenges 2 Data Freshness • Cross-network movement is slow • Each ETL creates more lag 4 Security & Governance • Data security & governance is increasingly complex 1 Speed & Complexity • Numerous storage & compute systems • Integration and interoperability issues (on prem, hybrid, cloud) • Many departments & groups 3 Cost • Data duplication • Data and App explosion driving cost up 6 Heavy integrations create painful organizational drag
7.
Confidential © Alluxio,
Inc.All Rights Reserved. 7 Data Ecosystem with Alluxio 7 • Apps only talk to Alluxio • Simple Add/Remove • No App Changes • Highest performance in Memory • No Lock in Native File System Hadoop Compatible File System Native Key-Value Interface Fuse Compatible File System HDFS Interface Amazon S3 Interface Swift Interface GlusterFS Interface
8.
Confidential © Alluxio,
Inc.All Rights Reserved. 8 Alluxio Design Principles 2 Optimize Data Access • Remote data • Service-oriented & microservices • Hot/warm/cold data • Temporary data 4 Enterprise Class • Distributed Architecture • Commodity Hardware • High Availability • Security 1 Big Data & Machine Learning • Interoperability with leading projects • Large scale data sets • High IO 3 Application Data Sharing • Multiple compute frameworks within a node or cluster • Shared storage • Read/write support 8
9.
Confidential © Alluxio,
Inc.All Rights Reserved. 9 Alluxio Innovation: Server-side API Translation Convert from Client-side Interface to Native Storage Interface HDFS Interface HDFS Interface S3A Interface Swift Interface Google Cloud Interface
10.
Confidential © Alluxio,
Inc.All Rights Reserved. 10 Alluxio Innovation: Server-side API Translation Convert between different versions of HDFS HDFS 2.7 Interface HDP 2.4 InterfaceCDH 5.6 Interface MAPR 5.2 Interface
11.
Confidential © Alluxio,
Inc.All Rights Reserved. 11 Alluxio Innovation: Unified Namespace Enables effective data management across different Under Stores Uses Mounting with Transparent Naming
12.
Confidential © Alluxio,
Inc.All Rights Reserved. 12 Alluxio Innovation: Unified Namespace Create a catalog of available data sources for Data Scientists /finance/customer-transactions/ /finance/vendor-transactions/ /operations/device-logs/ /operations/phone-call-recordings/ /operations/check-images/ /research/us-economic-data/ /research/intl-economic-data/ /marketing/advertising-dataset/ /marketing/marketing-funnel-dataset/ alluxio://
13.
Confidential © Alluxio,
Inc.All Rights Reserved. 13 Alluxio Innovation: Intelligent Cache Local performance from remote data using native multi-tier storage RAM SSD HDD Hot Warm Cold
14.
Confidential © Alluxio,
Inc.All Rights Reserved. 14 Where to use Alluxio Finding high-fit Alluxio use-cases Compute Zone Standalone or managed with Mesos orYarn Storage in Different Availability Zone Either on-prem or cloud Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance. Spark Tensorflow Presto HDFS Guidelines ü Compute separated from storage ü Distributed compute ü I/O or network latency exists ü Unification of many storage systems ü Applications sharing long lived data More checks result in higher fit applications
15.
Confidential © Alluxio,
Inc.All Rights Reserved. 15 Fastest Growing Big Data Open Source Projects Fastest Growing open-source project in the big data ecosystem Running in large production clusters 500+ Contributors from 100+ organizations 0 100 200 300 400 500 0 10 20 30 40 45 NumberofContributors Github Open Source Contributors by Month Alluxio Spark Kafka Redis HDFS Cassandra Hive 15
16.
Confidential © Alluxio,
Inc.All Rights Reserved. 16 Outline Alluxio Overview Alluxio + Spark Use Cases Using Spark with Alluxio Performance Evaluation Demo 1 2 3 4 5
17.
Confidential © Alluxio,
Inc.All Rights Reserved. 17 Big Data Case Study – 17 Challenge – Gain end to end view of business with large volume of data Queries were slow / not interactive, resulting in operational inefficiency SPARK TERADATA SPARK TERADATA Solution – ETL Data from Teradata to Alluxio Impact – Faster Time to Market – “Now we don’t have to work Sundays” http://bit.ly/2oMx95W
18.
Confidential © Alluxio,
Inc.All Rights Reserved. 18 Big Data Case Study – 18 Challenge – Gain end to end view of business with large volume of data Queries were slow / not interactive, resulting in operational inefficiency SPARK Baidu File System SPARK Baidu File System Solution – With Alluxio, data queries are 30X faster Impact – Higher operational efficiency http://bit.ly/2pDHS3O
19.
Confidential © Alluxio,
Inc.All Rights Reserved. 19 Big Data Case Study – 19 Challenge – Gain end to end view of business with large volume of data for $5B Travel Site Queries were slow / not interactive, resulting in operational inefficiency SPARK HDFS Solution – With Alluxio, 300x improvement in performance Impact – Increased revenue from immediate response to user behavior Use case: http://bit.ly/2pDJdrq CEPH HDFS CEPH FLINK SPARK FLINK
20.
Confidential © Alluxio,
Inc.All Rights Reserved. 20 Machine Learning Case Study – 20 Challenge – Disparate Data both on-prem and Cloud. Heterogeneous types of data. Scaling of Exabyte size data. Slow due to disk based approach. SPARK HDFS SPARK MINIO Solution – Using Alluxio to prevent I/O bottlenecks Impact – Orders of magnitude higher performance than before. http://bit.ly/2p18ds3 MESOS
21.
Confidential © Alluxio,
Inc.All Rights Reserved. 21 Outline Alluxio Overview Alluxio + Spark Use Cases Using Spark with Alluxio Performance Evaluation Demo 1 2 3 4 5
22.
Confidential © Alluxio,
Inc.All Rights Reserved. 22 Consolidating Memory 22 Storage Engine & Execution Engine Same Process • Two copies of data in memory – double the memory used • Inter-process Sharing Slowed Down by Network / Disk I/O Spark Compute Spark Storage block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Spark Compute Spark Storage block 1 block 3
23.
Confidential © Alluxio,
Inc.All Rights Reserved. 23 Consolidating Memory 23 Storage Engine & Execution Engine Different process • Half the memory used • Inter-process Sharing Happens at Memory Speed Spark Compute Spark Storage HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio block 1 block 3 block 4 Spark Compute Spark Storage
24.
Confidential © Alluxio,
Inc.All Rights Reserved. 24 Data Resilience During Crash 24 Spark Compute Spark Storage block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Storage Engine & Execution Engine Same Process
25.
Confidential © Alluxio,
Inc.All Rights Reserved. 25 Data Resilience During Crash 25 CRASH Spark Storage block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 • Process Crash Requires Network and/or Disk I/O to Re-read Data Storage Engine & Execution Engine Same Process
26.
Confidential © Alluxio,
Inc.All Rights Reserved. 26 Data Resilience During Crash 26 CRASH HDFS / Amazon S3 block 1 block 3 block 2 block 4 Storage Engine & Execution Engine Same Process • Process Crash Requires Network and/or Disk I/O to Re-read Data
27.
Confidential © Alluxio,
Inc.All Rights Reserved. 27 Data Resilience During Crash 27 Spark Compute Spark Storage HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio block 1 block 3 block 4 Storage Engine & Execution Engine Different process
28.
Confidential © Alluxio,
Inc.All Rights Reserved. 28 Data Resilience During Crash 28 • Process Crash – Data is Re-read at Memory Speed HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio block 1 block 3 block 4 CRASH Storage Engine & Execution Engine Different process
29.
Confidential © Alluxio,
Inc.All Rights Reserved. 29 Accessing Alluxio Data From Spark 29 Writing Data Write to an Alluxio file Reading Data Read from an Alluxio file
30.
Confidential © Alluxio,
Inc.All Rights Reserved. 30 Code Example for Spark RDDs 30 Writing RDD to Alluxio rdd.saveAsTextFile(alluxioPath)! rdd.saveAsObjectFile(alluxioPath)! Reading RDD from Alluxio rdd = sc.textFile(alluxioPath)! rdd = sc.objectFile(alluxioPath)!
31.
Confidential © Alluxio,
Inc.All Rights Reserved. 31 Code Example for Spark DataFrames 31 Writing to Alluxio df.write.parquet(alluxioPath)! Reading from Alluxio df = sc.read.parquet(alluxioPath)!
32.
Confidential © Alluxio,
Inc.All Rights Reserved. 32 Outline Alluxio Overview Alluxio + Spark Use Cases Using Spark with Alluxio Performance Evaluation Demo 1 2 3 4 5
33.
Confidential © Alluxio,
Inc.All Rights Reserved. 33 Experiments Spark 2.0.0 + Alluxio 1.2.0 Single worker:Amazon r3.2xlarge Comparisons: Alluxio Spark Storage Level: MEMORY_ONLY Spark Storage Level: MEMORY_ONLY_SER Spark Storage Level: DISK_ONLY
34.
Confidential © Alluxio,
Inc.All Rights Reserved. 34 0 50 100 150 200 250 0 5 10 15 20 25 30 35 40 45 50 Time[seconds] RDD Size [GB] Alluxio (textFile) Alluxio (objectFile) DISK_ONLY MEMORY_ONLY_SER MEMORY_ONLY Reading Cached RDD 34
35.
Confidential © Alluxio,
Inc.All Rights Reserved. 35 0 100 200 300 400 500 600 700 800 Alluxio (textFile) Alluxio (objectFile) No Alluxio Time [seconds] 7x speedup 16x speedup New Context: Read 50 GB RDD (S3) 35
36.
Confidential © Alluxio,
Inc.All Rights Reserved. 36 Reading Cached DataFrame (parquet) 36 0 50 100 150 200 250 0 5 10 15 20 25 30 35 40 45 50 Time[seconds] DataFrame Size [GB] Alluxio (textFile) MEMORY_ONLY_SER MEMORY_ONLY
37.
Confidential © Alluxio,
Inc.All Rights Reserved. 37 New Context: Read 50 GB DataFrame (S3) 37 0 250 500 750 1000 1250 1500 1750 Alluxio No Alluxio Time [seconds] 10x average speedup, 17x peak speedup
38.
Confidential © Alluxio,
Inc.All Rights Reserved. 38 Outline Alluxio Overview Alluxio + Spark Use Cases Using Spark with Alluxio Performance Evaluation Demo 1 2 3 4 5
39.
Confidential © Alluxio,
Inc.All Rights Reserved. 39 Demo Environment 39 Spark Alluxio
40.
Confidential © Alluxio,
Inc.All Rights Reserved. 40 Conclusion Easy to use Alluxio with Spark Predictable and improved performance Easily connect to various storages
41.
Twi$er.com/alluxio Linkedin.com/alluxio Website www.alluxio.com E-mail info@alluxio.com @ Social Media á ™ Confidential ©
Alluxio, Inc.All Rights Reserved. 41 Thank you! Haoyuan Li Ancil McBarnett haoyuan@alluxio.com ancil@alluxio.com Twitter: @haoyuan Twitter: @ Twi$er.com/alluxio Linkedin.com/alluxio Website www.alluxio.com E-mail info@alluxio.com @ Social Media á ™
Descargar ahora