Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Introducing the Hub for Data Orchestration
Adit Madan, Product Manager (Alluxio)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
3. DATA ORCHESTRATION ACROSS DATACENTERS
ALLUXIO 3
Cross Datacenter Access without
changing Ingest Pipelines
USE CASE 03: MULTI-DATACENTER
Presto
Alluxio
DATACENTER 1
DATACENTER 2
INGESTION
Burst compute to a public cloud
USE CASE 01: HYBRID
Hive
Alluxio
PUBLIC CLOUD
ON PREMISE
Hybrid Cloud Gateway for data in the
cloud
USE CASE 02: HYBRID
Alluxio
Pytorch
PUBLIC CLOUD
ON PREMISE
4. ALLUXIO 4
BRING DATA CLOSER TO COMPUTE ACROSS SILOS
Access based data movement for compute and storage spread across environments
REGION A REGION B
REGION A REGION B
PRIVATE DATA
CENTERS
DATACENTER 2DATACENTER 1
Hive
ALLUXIO 4
6. DATA PLATFORMS ARE COMPLEX
ALLUXIO 6
Connections
Compute accessing
data from sources
across regions
Monitoring
Failure detection,
prevention and
troubleshooting
Configuration
Dynamic configuration &
credential management is
operationally expensive
ALLUXIO 6
7. ALLUXIO 7
Connections
Compute accessing
data from sources
across regions
Monitoring
Failure detection,
prevention and
troubleshooting
Configuration
Dynamic configuration &
credential management is
operationally expensive
DATA PLATFORMS ARE COMPLEX
ALLUXIO 7
8. ALLUXIO 8
Connections
Compute accessing
data from sources
across regions
Monitoring
Failure detection,
prevention and
troubleshooting
Configuration
Dynamic configuration &
credential management is
operationally expensive
DATA PLATFORMS ARE COMPLEX
ALLUXIO 8
10. Data Orchestration Hub
New Management Service
Released with Alluxio 2.4!
Intuitive web UI for managing operations and
connectivity with data source across clouds
11. Data Orchestration Hub
New Management Service
Benefits
Connect your Data Sources
● Wizard based connection to sources
Management for your Alluxio Cluster
● Configuration and process monitoring
13. Expanded Metadata Service
Billions of files while breaking away from Hadoop
Cloud Native
Modern deployment stacks using a scalable
Embedded Journal system for master consensus
Benefits
Scale to billions of files
● Data orchestration across clouds and on-prem data centers
No external dependencies
● Eliminate system dependencies to deploy Alluxio with a
modern cloud-native computing stack
14. Sensitive Credential Management
Manage data access and protect secrets
Vault Integration
Secure management of sensitive information with
infrastructure across clouds and data centers
Benefits
Central management across environments
● Single location shared by applications in low trust networks
Secret management for elastic infrastructure
● Easily manage access across cloud native infrastructure
provisioned dynamically on-demand
15. And much more...
Simplified DevOps and System Monitoring
• Aggregated performance metrics and log collection
• Failure detection for asynchronous operations
Improved Cloud Deployments
• Terraform based infrastructure provisioning and deployment modules
Java 11 Support
16.
17. ▪ Accelerate your journey to the cloud
Lower risk of a full cloud data migration
▪ Access data immediately as if it were local
When you need compute capacity, migrate workloads without
moving your data first
▪ Ease of management
On-prem infrastructure does not change, including ingestion
pipelines
▪ Security
Integrate with on-prem security infrastructure and continue you
enforce the same access control policies in the cloud
Build a Hybrid Data Lake with Alluxio
17
AlluxioAlluxioAlluxioAlluxio
Burst compute to a public cloud &
gradually migrate
Same region
PrestoPrestoPrestoPresto
18. Presto
Datacenter 1
Amazon
EMR
Hive
Private Data Centers
Ingestion
Data Orchestration Hub for Hybrid Cloud Connectivity
ETL
Region A
Region A
User Journey
Step 0: Establish network connectivity
Connect an AWS VPC with the on-prem
network via VPN or Direct Connect
Step 1: Launch a cluster in the cloud
Optionally, Alluxio provided Terraform scripts
can be used for provisioning
Step 2: Use Data Orchestration Hub
Follow self-guided wizards to establish
connectivity with on-prem data and metadata
sources
Step 3: Run Analytics or AI!
19. Presto
Datacenter 1
Amazon
EMR
Hive
Private Data Centers
Ingestion
Data Orchestration Hub for Hybrid Cloud Connectivity
ETL
Region A
Region A
What we just saw
Step 0: Monitor Cluster Health
View the status of processes & stop a process
Step 1: HDFS Wizard
Connect an HDFS data source to Alluxio and
validate the configuration
Step 2: S3 Wizard
Connect an S3 Bucket as a data source
Both sources are visible in the same namespace
Step 3: Ready for Analytics or AI
Once all data & metadata sources are
connected you are ready to run jobs
21. Presto
Datacenter 1 Datacenter 2
Amazon
EMR
Hive
Cloud
Dataproc
Private Data Centers
Ingestion
Your Hub for Data Orchestration across for multiple environments
ETL
Kubernetes
Engine
Region A Region B
Region A Region B
Compute
Engine
22. Orchestrate data across all
your clusters using a single
pane of glass
USING THE HUB FOR DATA ORCHESTRATION
23. THANK YOU AND WELCOME
Website
www.alluxio.io
Slack
http://slackin.alluxio.io/
@ Social Media