SlideShare a Scribd company logo
1 of 20
Download to read offline
Royal Caribbean Cruises, Ltd.
2
• Founded in 1968
• Six companies employing over 65,000
people from 120 countries who have
served over 50 million guests
• Fleet of over 55 ships and growing
• Countless industry “firsts” - such as rock
climbing wall, ice skating, and surfing at
sea
• Each brand delivering a unique Guest
experience
• www.rclcorporate.com
33
44
55
6
77
88
99
1010
1111
What is Cerebro™
Cerebro™ is a project under Excalibur’s data program
focused on delivering a next-generation data
management platform.
Design Drivers and Architecture Principles
12
Cerebro™ is Cloud Native
Cloud-native data lake architecture leveraging vendor managed services
13
Managed Services Container Based
Azure Data Lake Store Azure Data Factory
Storage Type Object Store Document Store Graph Store
Which Data? Sensor data;
financial data;
Reference data;
dynamic schema
Relationships
Which Queries Data science; BI;
large analytical jobs
Single record; small
batches; mutations
Relationship
analysis; mutations
Key Considerations Parquet and Arrow
accelerate queries
Ability to handle
streaming
workloads
Flexibility and ability
to handle
complexity
Cerebro™ Leverages Different Storage Engines
Why there is a need for a Heterogeneous Data Lake
14
Azure Data Lake Store (ADLS)
Cerebro™ Leverages In-
Memory Architecture
• Scalability via distributed in-
memory compute layer, object
storage
• Dremio and Spark anchor in-
memory computing layer
• Parquet and object store (ADLS)
for storage layer, plus MongoDB
and Neo4j
• Dremio and Arrow Flight further
accelerate access and in-
memory processing
15
Compute Layer
Storage Layer
Today Future
(with Arrow Flight)
Cerebro™ - Phase 1
16
• Initial release focused on ingestion of
sources spanning current data silos
• Establishment of a Raw Zone with
Landing and Staging Areas
• Physical storage is file based (CSV,
Parquet) on Azure Data Lake Store
(ADLS) to support variety and variability
of data
• Staging Area requires users to be
familiar with low level data structures in
order to execute queries joining
disparate source systems (e.g. multiple
PMS and Casino sources)
Raw
Zone
Cloud Object Store, Document Store, Graph
Standardized
Zone
Enriched
Zone
Ingest
Batch
CDC
Batch
SFTP
File
RDBMS
Reservations
Customer Master
Property Management
Casino
Clickstream
Marketing
Metadata Management, Data Catalog, Data Ingestion, Data Integration
Data Virtualization, Self-service BI, Advanced Analytics
Data
Engineers
Operational
Analytics
BI
Analysts
Self-Service
Dashboards
Data
Scientists
Advanced
Analytics
Data
Stewards
Compliance
Analytics
Landing Area
Staging Area
Transform Consume
Data Pipeline – Phase 1
17
Data
Engineers
Data
Scientists
• Talend utilized to ingest data from a
number of sources (RDBMS, File-based,
API) into CSV files stored in the Landing
Area (ADLS)
• Talend / Spark leveraged to create
Parquet files in the Staging Area (ADLS)
• In-memory columnar (Arrow) via Dremio
accelerates SQL based query access for
data engineering and data science use
cases
• Leverages data virtualization within
Dremio to support simple ad-hoc
integration and agile exploration
• Supports data science and advanced
analytics (AI/ML) via Azure Databricks
(Python, Scala, Java, R)
Ingest
Talend
Azure HDInsight
Persist
Azure Data Lake Store
Model/PredictExplore
Dremio
Azure Data Catalog
Azure Databricks
Python
Scala
Java
R
Roles
Azure Data Lake Store
Azure HDInsight
Azure Data Catalog
Cerebro™ - Phase 2
18
• Implementation of a Standardized Zone
based on semantic view of entities that
will be easier to query for casual users
• Introduction of MongoDB (Document)
will allow the platform to support low
latency ingestion and consumption of
customer data required to support
downstream applications (Call Center)
• Dremio still leveraged to support
analytical use cases involving customer
data stored in MongoDB (Marketing)
• Introduction of Neo4j (Graph) will
increase overall agility (relationships) as
well as provide insights by leveraging
advanced functionality (patterns,
recommendations)
Raw
Zone
Cloud Object Store, Document Store, Graph
Standardized
Zone
Enriched
Zone
Ingest
Batch
CDC
Batch
SFTP
File
RDBMS
Reservations
Customer Master
Property Management
Casino
Clickstream
Marketing
Metadata Management, Data Catalog, Data Ingestion, Data Integration
Data Virtualization, Self-service BI, Advanced Analytics
Data
Engineers
Operational
Analytics
BI
Analysts
Self-Service
Dashboards
Data
Scientists
Advanced
Analytics
Data
Stewards
Compliance
Analytics
Landing Area
Staging Area
Transform Consume
Downstream
Applications
Developers
Data Pipeline – Phase 2
19
Data
Engineers
Data
Scientists
Ingest/Process
Talend
Azure HDInsight
Azure Databricks
Azure Data Factory
Persist
Azure Data Lake Store
MongoDB Atlas
Neo4j
Model/PredictExplore/Visualize
Dremio
Azure Data Catalog
Power BI
Azure Databricks
Python
Scala
Java
R
Roles
• Talend used to develop pipelines that
process (cleanse, integrate, harmonize)
data sourced from Raw Zone
• Data resulting from pipeline executions
is persisted in the appropriate store(s)
(ADLS, Neo4j and MongoDB) to support
both analytical and operational
requirements
• Develop services to be consumed by
customer facing applications and other
downstream processes via managed
APIs
BI
Analysts
Data
Stewards
Services
Azure Functions
Apigee
Azure Kubernetes Service
Azure HDInsight
Azure Data Lake Store
Azure Data Catalog
Azure Data Factory
Azure Kubernetes Service
Azure Functions
User ExperienceProcessIngestData Sources
Consumers
Modern
Analytics
Modern
Data Platform
BusinessAnalystsDataScientists
Batch
Integration
Applications
Streaming
Integration
Kafka on
HDInsight
On-Premises
Property
Management
Customer
Master
Reservations
Casino
Spark on
HDInsight
Talend
Big Data
Azure Data Lake Store
External
Clickstream
Customer
Feedback
Campaign
Management
Neo4j Causal Cluster
Azure Event Hubs
Self-Service
Data Analytics
Azure Data Catalog
Advanced Analytics
Azure Data Factory
Data Services
Azure Functions
Azure Kubernetes Service
MongoDB Atlas
20
DBeaver EE

More Related Content

What's hot

Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
IBM Analytics
 

What's hot (20)

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinar
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
 
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure ManagementScaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Building IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on AzureBuilding IoT and Big Data Solutions on Azure
Building IoT and Big Data Solutions on Azure
 
Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0Leap to Next Generation Data Management with Denodo 7.0
Leap to Next Generation Data Management with Denodo 7.0
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Dremio introduction
Dremio introductionDremio introduction
Dremio introduction
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
Denodo Data Virtualization Platform: Scalability (session 3 from Architect to...
 
The Analytics Data Store: Information Supply Framework
The Analytics Data Store: Information Supply FrameworkThe Analytics Data Store: Information Supply Framework
The Analytics Data Store: Information Supply Framework
 

Similar to Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019

Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
thando80
 

Similar to Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019 (20)

Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptxBuilding Big Data Solutions with Azure Data Lake.10.11.17.pptx
Building Big Data Solutions with Azure Data Lake.10.11.17.pptx
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Cerebro: Bringing together data scientists and bi users - Royal Caribbean - Strata - London 2019

  • 1.
  • 2. Royal Caribbean Cruises, Ltd. 2 • Founded in 1968 • Six companies employing over 65,000 people from 120 countries who have served over 50 million guests • Fleet of over 55 ships and growing • Countless industry “firsts” - such as rock climbing wall, ice skating, and surfing at sea • Each brand delivering a unique Guest experience • www.rclcorporate.com
  • 3. 33
  • 4. 44
  • 5. 55
  • 6. 6
  • 7. 77
  • 8. 88
  • 9. 99
  • 10. 1010
  • 11. 1111
  • 12. What is Cerebro™ Cerebro™ is a project under Excalibur’s data program focused on delivering a next-generation data management platform. Design Drivers and Architecture Principles 12
  • 13. Cerebro™ is Cloud Native Cloud-native data lake architecture leveraging vendor managed services 13 Managed Services Container Based Azure Data Lake Store Azure Data Factory
  • 14. Storage Type Object Store Document Store Graph Store Which Data? Sensor data; financial data; Reference data; dynamic schema Relationships Which Queries Data science; BI; large analytical jobs Single record; small batches; mutations Relationship analysis; mutations Key Considerations Parquet and Arrow accelerate queries Ability to handle streaming workloads Flexibility and ability to handle complexity Cerebro™ Leverages Different Storage Engines Why there is a need for a Heterogeneous Data Lake 14 Azure Data Lake Store (ADLS)
  • 15. Cerebro™ Leverages In- Memory Architecture • Scalability via distributed in- memory compute layer, object storage • Dremio and Spark anchor in- memory computing layer • Parquet and object store (ADLS) for storage layer, plus MongoDB and Neo4j • Dremio and Arrow Flight further accelerate access and in- memory processing 15 Compute Layer Storage Layer Today Future (with Arrow Flight)
  • 16. Cerebro™ - Phase 1 16 • Initial release focused on ingestion of sources spanning current data silos • Establishment of a Raw Zone with Landing and Staging Areas • Physical storage is file based (CSV, Parquet) on Azure Data Lake Store (ADLS) to support variety and variability of data • Staging Area requires users to be familiar with low level data structures in order to execute queries joining disparate source systems (e.g. multiple PMS and Casino sources) Raw Zone Cloud Object Store, Document Store, Graph Standardized Zone Enriched Zone Ingest Batch CDC Batch SFTP File RDBMS Reservations Customer Master Property Management Casino Clickstream Marketing Metadata Management, Data Catalog, Data Ingestion, Data Integration Data Virtualization, Self-service BI, Advanced Analytics Data Engineers Operational Analytics BI Analysts Self-Service Dashboards Data Scientists Advanced Analytics Data Stewards Compliance Analytics Landing Area Staging Area Transform Consume
  • 17. Data Pipeline – Phase 1 17 Data Engineers Data Scientists • Talend utilized to ingest data from a number of sources (RDBMS, File-based, API) into CSV files stored in the Landing Area (ADLS) • Talend / Spark leveraged to create Parquet files in the Staging Area (ADLS) • In-memory columnar (Arrow) via Dremio accelerates SQL based query access for data engineering and data science use cases • Leverages data virtualization within Dremio to support simple ad-hoc integration and agile exploration • Supports data science and advanced analytics (AI/ML) via Azure Databricks (Python, Scala, Java, R) Ingest Talend Azure HDInsight Persist Azure Data Lake Store Model/PredictExplore Dremio Azure Data Catalog Azure Databricks Python Scala Java R Roles Azure Data Lake Store Azure HDInsight Azure Data Catalog
  • 18. Cerebro™ - Phase 2 18 • Implementation of a Standardized Zone based on semantic view of entities that will be easier to query for casual users • Introduction of MongoDB (Document) will allow the platform to support low latency ingestion and consumption of customer data required to support downstream applications (Call Center) • Dremio still leveraged to support analytical use cases involving customer data stored in MongoDB (Marketing) • Introduction of Neo4j (Graph) will increase overall agility (relationships) as well as provide insights by leveraging advanced functionality (patterns, recommendations) Raw Zone Cloud Object Store, Document Store, Graph Standardized Zone Enriched Zone Ingest Batch CDC Batch SFTP File RDBMS Reservations Customer Master Property Management Casino Clickstream Marketing Metadata Management, Data Catalog, Data Ingestion, Data Integration Data Virtualization, Self-service BI, Advanced Analytics Data Engineers Operational Analytics BI Analysts Self-Service Dashboards Data Scientists Advanced Analytics Data Stewards Compliance Analytics Landing Area Staging Area Transform Consume Downstream Applications Developers
  • 19. Data Pipeline – Phase 2 19 Data Engineers Data Scientists Ingest/Process Talend Azure HDInsight Azure Databricks Azure Data Factory Persist Azure Data Lake Store MongoDB Atlas Neo4j Model/PredictExplore/Visualize Dremio Azure Data Catalog Power BI Azure Databricks Python Scala Java R Roles • Talend used to develop pipelines that process (cleanse, integrate, harmonize) data sourced from Raw Zone • Data resulting from pipeline executions is persisted in the appropriate store(s) (ADLS, Neo4j and MongoDB) to support both analytical and operational requirements • Develop services to be consumed by customer facing applications and other downstream processes via managed APIs BI Analysts Data Stewards Services Azure Functions Apigee Azure Kubernetes Service Azure HDInsight Azure Data Lake Store Azure Data Catalog Azure Data Factory Azure Kubernetes Service Azure Functions
  • 20. User ExperienceProcessIngestData Sources Consumers Modern Analytics Modern Data Platform BusinessAnalystsDataScientists Batch Integration Applications Streaming Integration Kafka on HDInsight On-Premises Property Management Customer Master Reservations Casino Spark on HDInsight Talend Big Data Azure Data Lake Store External Clickstream Customer Feedback Campaign Management Neo4j Causal Cluster Azure Event Hubs Self-Service Data Analytics Azure Data Catalog Advanced Analytics Azure Data Factory Data Services Azure Functions Azure Kubernetes Service MongoDB Atlas 20 DBeaver EE