SlideShare a Scribd company logo
1 of 25
Download to read offline
mycervello.com
Modern Data Warehouses In The Cloud
Use Cases + Live Demo
19th February 2019
Slim Baltagi
Director, Big Data & ML
DFW Area Advanced
Analytics Meetup
© 2019 Cervello Inc.
Agenda
1. Quick Intro: Talk Format, Speaker Background, Q&A session
2. Key Terms: Cloud, Data Warehouse, Modern, …
3. Data Era: Challenges & Opportunities
4. Traditional Data Warehouses: Key Challenges
5. Modern Data Warehouses: Examples, Solutions, Key
Opportunities,
6. Use Cases + Live Demo
7. Key Takeaways & Where To Go From Here?
8. Interactive Session: Q&A, discussion and more networking
2
© 2019 Cervello Inc.
1. Quick Intro: Speaker, Talk, Q&A session
§ Format of this talk: 40 minutes Talk, a 15 minutes Live Demo, a 30 minutes Q&A session
§ Currently
• Advocating Modern Data Architecture (MDA) and Modern Data Warehouses in the cloud to organizations as
director at Cervello Inc., an A.T. Kearney fast-growing professional services company
• Working with our team of creative problem solvers to deliver projects that help our clients win with data
• Enjoying emerging technologies (Kubernetes, Kubeflow …) , speaking at conferences and organizing meetups
§ In the past
• Created a mathematical formula for provisioning Apache Hadoop, now an obsolete technology!
• Evangelized Apache Spark and was the first to fully explain how it relates to Apache Hadoop without any
marketing fluff!
• Initiated a discussion at Hortonworks (recently merged with Cloudera) on its official stance on Apache Spark at a
time when Hortonworks itself was happier with Tez and touting it as the future compute engine of Hadoop!
• Publicly shared the limitations of Spark Streaming. As an alternative, evangelized Apache Flink and introduced it
to the western hemisphere. In particular, at Capital One as first adopter before the likes of Uber, Netflix, Lyft, etc
… Gave Apache Flink talks in 4 continents
• Evangelized Apache Kafka as a platform for building end-to-end streaming data applications using Kafka Core,
Kafka connect and Kafka Streams/KSQL
• Delivered many end-to-end data projects as director, architect and engineer
3
© 2019 Cervello Inc.
2. Key Terms: Cloud
§ What is exactly the cloud? According to Gartner, ‘A style of computing in which scalable and
elastic IT-enabled capabilities are delivered as a service using Internet technologies’
§ What are the models within the cloud and how do they apply to data warehousing?
4
Hardware:
Datacenter
Software: Data warehouse Management:
Optimizing, tuning,
maintenance
Cook, Eat, Clean
On-premises You You You At home
Infrastructure IaaS
Example: EC2
Vendor You You At a vacation
home
Platform PaaS
Example: Redshift,
EMR
Vendor Vendor You At a fast food
restaurant
Software SaaS
Example: Snowflake
Vendor Vendor Vendor At a full service
restaurant
© 2019 Cervello Inc.
2. Key Terms: Data Warehouse, Modern
§ What is a data warehouse? ‘A subject-oriented, integrated, time-variant, non-volatile collection
of data in support of management’s decision-making process.’ Source: Building the data
warehouse, a book by Bill Inmon, who is considered to be the father of data warehousing
§ From the beginning data warehousing was about the business making better and quicker
decisions
§ A key factor driving the evolution of data warehousing is the cloud. An updated definition of a
data warehouse in this modern era of cloud, social networks and mobile would add attributes
such as elastic, scalable (All data, All users),shareable, agile, conductive to exploratory
analytics. What do you think?
§ Be aware that a traditional data warehouse hosted in the cloud is not necessarily a data
warehouse ‘built for the cloud’ but it is a ‘cloud-washed’ one! Amazon Redshift would be a
good example. Hold on, more to come on this!
§ Often modernization in the context of data warehouse is confined to just a ‘technology refresh’.
How about getting any improvement to the underlying business processes? That would require
a ‘mindset refresh’!
5
© 2019 Cervello Inc.
3. Data Era: Challenges
Regardless of using databases, data lakes, data warehouses or data swamps,
businesses all share common challenges related to data in this new era!
• Data is a game changing asset but most organizations struggle to derive business value
from it
• Business decisions are hindered by slow and incomplete data analytics
• Legacy technologies, skills, methods and mindsets are stalling innovation
• Changing business requirements are outpacing the capability of IT teams to deliver
• Silos of diverse data from diverse sources ( internal enterprise apps, public agencies,
external events, 3rd party services, web, …) are more and more segregated, data can not be
combined together and business value can not be derived from it
• Costly and complex data infrastructure consuming significant resources to build and
maintain data platforms
• Technology selection is becoming more and more challenging due to the abundance of data
tools, platforms and services
6
© 2019 Cervello Inc.
3. Data Era: Opportunities
What are some of today’s opportunities in relation to data in this new era?
• Cloud computing and the increase in computer-processing power, storage and data-
gathering abilities enable businesses to rent much-higher-capacity computer infrastructure
from third-party vendors, avoiding the hefty costs and additional resources needed to run
their own data centers
• External data sources such as social media data, weather data, demographic data and other
third-party data can enrich and augment internal data to help businesses with better
decision making
• Real-Time or Near-Real-Time insights that match fast paced business and markets thanks
to the maturity of stream processing technologies
• Data-driven decisions thanks to growing adoption of Machine Learning and AI technologies
and services and also availability of more powerful compute and cheaper storage services
• Being a data-driven company is not confined to tech giants anymore
7
© 2019 Cervello Inc.
4. Traditional Data Warehouses: Key Challenges
1. Scalability: growth in data volume, number of users and applications
2. Concurrency: as number of users increase, they can not operate simultaneously
3. Performance: slow running queries, …
4. Resilience: data backup/retention and node failure protection
5. Complexity: from initial implementation to ongoing maintenance
6. Lack of native support for semi-structured data: JSON and XML formats are popular for
data exchange. In traditional data warehouses, data needs to be transformed first and
schema needs to be defined before loading
7. High maintenance overhead in the form of constant indexing, tuning, sorting
8. Handling workload fluctuation: sizing servers for workload peaks and valleys
9. Waste of resources: expensive computing resources are wasted during off peak usage
10. Lack of support for processing streaming data
11. Upfront costs: hardware, software licenses, staffing, …
12. Project delays due to provisioning infrastructure
Sure, you are familiar with real world scenarios such as End of Month Reporting, Intensive Load
Process, Demanding Executive Dashboard Users, …
8
© 2019 Cervello Inc.
5. Modern Data Warehouses: Examples
9
Reference: The Forrester Wave™: Cloud Data Warehouse, Q4 2018, October 29, 2018
The 14 Providers That Matter Most And How They Stack Up
© 2019 Cervello Inc.
5. Modern Data Warehouses: Examples
10
© 2019 Cervello Inc.
5. Modern Data Warehouses: Solutions
§ Major modern data warehouses are cloud based. Most of them do overcome the key
challenges of the traditional data warehouses. Unfortunately, overcoming these challenges is
happening at different degrees and it is not always a simple check of a yes or no!!
1. Scalability: Scale up, down, or off quickly without delay. Yes for Snowflake. Low for Redshift,
Yes for Azure Data Warehouse, Yes for Big Query ( but with limits)
2. Concurrency: Lots of users can operate simultaneously
3. Performance: processing bottlenecks and delays, slow running queries, …
4. Resilience: data backup/retention and node failure protection
5. Complexity: Cloud data warehouses are easier to use than on-premise data warehouses
6. Lack of native support of semi-structured data: Although cloud data warehouse do support
semi-structured data such as JSON. This support varies from one data warehouse to
another.
−Concurrent throughput for JSON: Yes for Snowflake, No for Redshift, No for Azure Data
Warehouse, Moderate for Big Query.
−High performance for JSON scans: Yes for Snowflake, No for Redshift unless you use the
additional Amazon Spectrum tool, No for Azure Data Warehouse, Moderate for Big Query.
11
© 2019 Cervello Inc.
5. Modern Data Warehouses: Solutions
7. High maintenance overhead is not a major issue with managed infrastructure for cloud data
warehouses, but the level of maintenance varies from one data warehouse to another. This
fluctuates to creating indices and distribution keys to near zero management
8. Handling workload fluctuation: With elasticity, cloud data warehouses adapt to workload peaks
and valleys
9. Waste of resources: No more wasted resources during off peak usage! Auto suspend, Auto
resume? Anybody knows about these features from Snowflake data warehouse?
10. Lack of support for streaming data: For example, loading and processing is possible with
Snowpipe, a serverless compute service from Snowflake data warehouse
11. Upfront costs: No upfront costs as the model of cloud data warehouse is pay as you go
12. Project Delays due to provisioning infrastructure: Not Applicable for cloud data warehouse.
With instant provisioning, projects don’t need to wait for infrastructure
12
© 2019 Cervello Inc.
5. Modern Data Warehouses: Snowflake key innovations
13
§ Snowflake is based on three key innovations:
1. Unique architecture : a unique architecture designed for the cloud and able to provide
complete elasticity for all your concurrent users and applications
2. Database engine that natively handles all your data both semi structured and structured
data without sacrificing performance nor flexibility
3. Technology that eliminates the need for manual data warehouse management and tuning.
No indexing, tuning, partitioning or vacuuming after loading data => effortless management
§ Snowflake architecture consists of three layers, each one is physically decoupled from the
other layer and scales independently:
1. Data Storage layer uses cloud storage to store all data loaded into snowflake in a scalable
and inexpensive way
2. Compute layer comprises of virtual warehouses compute resources that execute data
processing tasks required for queries. The virtual warehouses have access to all of the
data in the storage layer
3. Cloud Services layer coordinates the entire system managing security, optimization and
metadata
© 2019 Cervello Inc.
5. Modern Data Warehouses: Snowflake’s multi-cluster, shared data
architecture
14
© 2019 Cervello Inc.
6. Uses Cases + Live Demo
§ Data as a Service
§ Self-Service Analytics
§ Advanced Analytics Lab
§ Real-Time Insight
§ Single View of Customer
15
Data Consumers
§ Partners, Suppliers, and Vendors
§ Executives, Managers, and Analysts
§ Data Scientists
§ Embedded Applications
§ Many Business Users
New Use Cases
© 2019 Cervello Inc.
6. Use Cases + Live Demo: Snowflake reference architecture
16
Data Sources/Data Providers: On-premise or Cloud, Structured or Unstructured Data
Dataflow Tools: Streaming Data Platforms, ETL and ELT platforms. E: Extract, L: Load, T: Transform
Data Storage: S3/Azure Storage (missing in the architecture diagram!)
Analytics Services: BI, UI, Data Science, …
© 2019 Cervello Inc.
6. Use Case + Live Demo: Live Tracking of CTA Buses
17
§ The purpose of this live demo is to showcase a few aspects of a Snowflake such as Ease of
use, Fast provisioning, Native support of semi-structured data, Continuous data loading ,
…Due to time constraints, we won’t cover other unique features of Snowflake such as multi-
warehouse concurrency against same data, data sharing, time travel …
§ This live demo itself is about an end-to-end application for live tracking of CTA buses that:
1. Sends a request every minute to CTA Bus Tracker web service about live bus locations
and gets an immediate response as JSON data load
2. Stores the JSON data into Amazon S3 and trigger a notification event into an SQS queue
3. Consumes the event from the SQS using Snowpipe, a serverless compute service from
Snowflake. Snowpipe automatically loads JSON data, as is with no ETL, from S3 into a
Snowflake SQL database
4. Queries the JSON data using Snowflake’s flatten method as extension to standard ANSI
SQL
5. Visualizes the buses locations on a map
§ This is not a toy use case! Since JSON is a popular format for data exchange, the same data
pipeline can be leveraged in many other use cases with different data sources: internal
applications, external enterprise data services, machine generated data such as IoT devices …
© 2019 Cervello Inc.
6. Use Case + Live Demo: Live Tracking of CTA Buses
18
§ What did I use to cook this application?
• AWS Free tier: a free account from https://aws.amazon.com/free/
• Snowflake Test Drive: 30 days free trial through our partnership with Snowflake. Who wants
one?
https://trial.snowflake.com/?utm_source=cervello&utm_medium=referral&utm_campaign=sel
f-service-partner-referral-cervello
• CTA (Chicago Transit Authority) Bus Tracker API to get free up-to-the-minute JSON data
feeds: https://www.transitchicago.com/developers/bustracker/ You just need to request a
free API key from CTA
• Anaconda: Free and open source distribution of Python and R programming language.
Comes with Python IDE (Spyder 3.3.2) and other goodies!
• Free trial of Tableau Online: https://www.tableau.com/products/cloud-bi
© 2019 Cervello Inc.
6. Use Cases + Live Demo: Live Tracking of CTA buses
19
© 2019 Cervello Inc.
7. Key Takeaways & Where To Go From Here?
§Try it yourself
−Snowflake is worth a Free trial!
https://trial.snowflake.com/?utm_source=cervello&utm_medium=referral&utm_campaign=s
elf-service-partner-referral-cervello
−You can build a short term POC in either AWS or Azure while using US $400 credit for 30
days for both compute and storage,
§Engage the business
−It is critical to identify business sponsorship and use cases
−Cervello prefers short, high impact workshop approach
§Get end to end help
−Pick the right data platform … For example, the most ‘popular’ data warehouse is not
necessarily the one that fulfills your data warehouse needs! We helped many of our clients
move from Amazon Redshift and other traditional data warehouses to Snowflake.
−Avoid having a pure ‘technology refresh’ and missing the opportunities to add business
value (refer to use cases)
§Get advice
20
© 2019 Cervello Inc.
8. Interactive Session: Q&A, discussion and more networking
§ The goal here is for the audience to chime in now and later on the comments section of the
event or the meetup discussions section.
§ Here is a few samples of food for thought!
• Can somebody briefly share his/her real-world experience migrating from a legacy data
warehouse to a modern data warehouse?
• Did anybody worked on a fit-for-purpose analysis of modern data warehouses?
• What would be the best practices of using a modern data warehouse such as Snowflake?
• What advice will you give someone transitioning his/her career to work on a modern data
warehouse?
• Is there anyone who would like to give a follow up talk on modern data warehousing?
21
© 2019 Cervello Inc.
Boston | New York | Dallas | London
Get in touch with contributors:
Thank You!
Learn more about Cervello at
mycervello.com
Slim Baltagi
sbaltagi@gmail.com
https://www.linkedin.com/in/slimbaltagi/
@SlimBaltagi
22
Jim Leavitt
jleavitt@mycervello.com
https://www.linkedin.com/in/jimleavitt/
© 2019 Cervello Inc.
Appendix
23
© 2019 Cervello Inc.
Cervello Profile
We are a professional services firm focused on helping organizations win with data. We have a global presence with an ability to service customers across
industries. We are organized around three practices that our under pinned by our expertise in modern data architectures.
PERFORMANCE
MANAGEMENT
SUPPLIER + CUSTOMER
RELATIONSHIPS
DATA MONETIZATION +
PRODUCTS
DATA MANAGEMENT
& MACHINE LEARNING
We embed data management
processes and use machine
learning to improve data quality
Using leading cloud technology
platforms we develop and
design analytics-ready
connected data solutions
MODERN DATA
ARCHITECTURE & PLATFORMS
ANALYTICS & DATA SCIENCE
MODELING & PLANNING
MOBILE EXPERIENCE
We provide different methods
to consume data to facilitate
insights and actions inside and
outside the organization
BUSINESS DRIVEN
INSIGHTS & ACTIONS
HOWWEDELIVER
WHATWEDELIVER
STRATEGY
BUILD
SERVICES
RUN & COE
SERVICES
Our teams are located in Boston, New York, Dallas, London and Bangalore
© 2019 Cervello Inc.
25
TRAIN
2 DAYS
Getting around Snowflake
Analyzing & Reporting Data
Data Management & Security
PLAN
< 1 WEEK
Use case inventory
Readiness Assessment
Architecture Design
ACTION
1-2 WEEKS
Use case & MVP defined
POC Implementation
Stakeholder Presentation
TCO Analysis
Cervello Value
Accelerator
9
Years Working In The Cloud
30
Snowflake Trained Consultants
100+
Big Data & Analytics Engagements
End to End
Enablement

More Related Content

What's hot

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data FactoryHARIHARAN R
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Cathrine Wilhelmsen
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with linksChris Testa-O'Neill
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreukErwin de Kreuk
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureMark Kromer
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 

What's hot (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
 
Data lake
Data lakeData lake
Data lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreuk
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 

Similar to Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | QuboleVasu S
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?Harald Erb
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfRajvir Kaushal
 
Unblocking Innovation for Digital Transformation
Unblocking Innovation for Digital TransformationUnblocking Innovation for Digital Transformation
Unblocking Innovation for Digital TransformationAmazon Web Services
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Precisely
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTyler Wishnoff
 
Cloud Computing Overview
Cloud Computing OverviewCloud Computing Overview
Cloud Computing OverviewDoug Allen
 
Calculating the true value of industry specific clouds linthicum
Calculating the true value of industry specific clouds linthicumCalculating the true value of industry specific clouds linthicum
Calculating the true value of industry specific clouds linthicumDavid Linthicum
 
NoOps in a Serverless World
NoOps in a Serverless WorldNoOps in a Serverless World
NoOps in a Serverless WorldGary Arora
 
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostHow Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostDana Gardner
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making ThingsJC Davis
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017Jeremy Maranitch
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaJyrki Määttä
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 

Similar to Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi (20)

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdf
 
Unblocking Innovation for Digital Transformation
Unblocking Innovation for Digital TransformationUnblocking Innovation for Digital Transformation
Unblocking Innovation for Digital Transformation
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
 
CF-Chapitre1.pdf
CF-Chapitre1.pdfCF-Chapitre1.pdf
CF-Chapitre1.pdf
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented Analytics
 
Cloud Storage Benefits
Cloud Storage BenefitsCloud Storage Benefits
Cloud Storage Benefits
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Cloud Computing Overview
Cloud Computing OverviewCloud Computing Overview
Cloud Computing Overview
 
Calculating the true value of industry specific clouds linthicum
Calculating the true value of industry specific clouds linthicumCalculating the true value of industry specific clouds linthicum
Calculating the true value of industry specific clouds linthicum
 
NoOps in a Serverless World
NoOps in a Serverless WorldNoOps in a Serverless World
NoOps in a Serverless World
 
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostHow Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 

More from Slim Baltagi

Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesSlim Baltagi
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiSlim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Slim Baltagi
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 

More from Slim Baltagi (20)

Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetes
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Apache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim BaltagiApache Flink community Update for March 2016 - Slim Baltagi
Apache Flink community Update for March 2016 - Slim Baltagi
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi

  • 1. mycervello.com Modern Data Warehouses In The Cloud Use Cases + Live Demo 19th February 2019 Slim Baltagi Director, Big Data & ML DFW Area Advanced Analytics Meetup
  • 2. © 2019 Cervello Inc. Agenda 1. Quick Intro: Talk Format, Speaker Background, Q&A session 2. Key Terms: Cloud, Data Warehouse, Modern, … 3. Data Era: Challenges & Opportunities 4. Traditional Data Warehouses: Key Challenges 5. Modern Data Warehouses: Examples, Solutions, Key Opportunities, 6. Use Cases + Live Demo 7. Key Takeaways & Where To Go From Here? 8. Interactive Session: Q&A, discussion and more networking 2
  • 3. © 2019 Cervello Inc. 1. Quick Intro: Speaker, Talk, Q&A session § Format of this talk: 40 minutes Talk, a 15 minutes Live Demo, a 30 minutes Q&A session § Currently • Advocating Modern Data Architecture (MDA) and Modern Data Warehouses in the cloud to organizations as director at Cervello Inc., an A.T. Kearney fast-growing professional services company • Working with our team of creative problem solvers to deliver projects that help our clients win with data • Enjoying emerging technologies (Kubernetes, Kubeflow …) , speaking at conferences and organizing meetups § In the past • Created a mathematical formula for provisioning Apache Hadoop, now an obsolete technology! • Evangelized Apache Spark and was the first to fully explain how it relates to Apache Hadoop without any marketing fluff! • Initiated a discussion at Hortonworks (recently merged with Cloudera) on its official stance on Apache Spark at a time when Hortonworks itself was happier with Tez and touting it as the future compute engine of Hadoop! • Publicly shared the limitations of Spark Streaming. As an alternative, evangelized Apache Flink and introduced it to the western hemisphere. In particular, at Capital One as first adopter before the likes of Uber, Netflix, Lyft, etc … Gave Apache Flink talks in 4 continents • Evangelized Apache Kafka as a platform for building end-to-end streaming data applications using Kafka Core, Kafka connect and Kafka Streams/KSQL • Delivered many end-to-end data projects as director, architect and engineer 3
  • 4. © 2019 Cervello Inc. 2. Key Terms: Cloud § What is exactly the cloud? According to Gartner, ‘A style of computing in which scalable and elastic IT-enabled capabilities are delivered as a service using Internet technologies’ § What are the models within the cloud and how do they apply to data warehousing? 4 Hardware: Datacenter Software: Data warehouse Management: Optimizing, tuning, maintenance Cook, Eat, Clean On-premises You You You At home Infrastructure IaaS Example: EC2 Vendor You You At a vacation home Platform PaaS Example: Redshift, EMR Vendor Vendor You At a fast food restaurant Software SaaS Example: Snowflake Vendor Vendor Vendor At a full service restaurant
  • 5. © 2019 Cervello Inc. 2. Key Terms: Data Warehouse, Modern § What is a data warehouse? ‘A subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision-making process.’ Source: Building the data warehouse, a book by Bill Inmon, who is considered to be the father of data warehousing § From the beginning data warehousing was about the business making better and quicker decisions § A key factor driving the evolution of data warehousing is the cloud. An updated definition of a data warehouse in this modern era of cloud, social networks and mobile would add attributes such as elastic, scalable (All data, All users),shareable, agile, conductive to exploratory analytics. What do you think? § Be aware that a traditional data warehouse hosted in the cloud is not necessarily a data warehouse ‘built for the cloud’ but it is a ‘cloud-washed’ one! Amazon Redshift would be a good example. Hold on, more to come on this! § Often modernization in the context of data warehouse is confined to just a ‘technology refresh’. How about getting any improvement to the underlying business processes? That would require a ‘mindset refresh’! 5
  • 6. © 2019 Cervello Inc. 3. Data Era: Challenges Regardless of using databases, data lakes, data warehouses or data swamps, businesses all share common challenges related to data in this new era! • Data is a game changing asset but most organizations struggle to derive business value from it • Business decisions are hindered by slow and incomplete data analytics • Legacy technologies, skills, methods and mindsets are stalling innovation • Changing business requirements are outpacing the capability of IT teams to deliver • Silos of diverse data from diverse sources ( internal enterprise apps, public agencies, external events, 3rd party services, web, …) are more and more segregated, data can not be combined together and business value can not be derived from it • Costly and complex data infrastructure consuming significant resources to build and maintain data platforms • Technology selection is becoming more and more challenging due to the abundance of data tools, platforms and services 6
  • 7. © 2019 Cervello Inc. 3. Data Era: Opportunities What are some of today’s opportunities in relation to data in this new era? • Cloud computing and the increase in computer-processing power, storage and data- gathering abilities enable businesses to rent much-higher-capacity computer infrastructure from third-party vendors, avoiding the hefty costs and additional resources needed to run their own data centers • External data sources such as social media data, weather data, demographic data and other third-party data can enrich and augment internal data to help businesses with better decision making • Real-Time or Near-Real-Time insights that match fast paced business and markets thanks to the maturity of stream processing technologies • Data-driven decisions thanks to growing adoption of Machine Learning and AI technologies and services and also availability of more powerful compute and cheaper storage services • Being a data-driven company is not confined to tech giants anymore 7
  • 8. © 2019 Cervello Inc. 4. Traditional Data Warehouses: Key Challenges 1. Scalability: growth in data volume, number of users and applications 2. Concurrency: as number of users increase, they can not operate simultaneously 3. Performance: slow running queries, … 4. Resilience: data backup/retention and node failure protection 5. Complexity: from initial implementation to ongoing maintenance 6. Lack of native support for semi-structured data: JSON and XML formats are popular for data exchange. In traditional data warehouses, data needs to be transformed first and schema needs to be defined before loading 7. High maintenance overhead in the form of constant indexing, tuning, sorting 8. Handling workload fluctuation: sizing servers for workload peaks and valleys 9. Waste of resources: expensive computing resources are wasted during off peak usage 10. Lack of support for processing streaming data 11. Upfront costs: hardware, software licenses, staffing, … 12. Project delays due to provisioning infrastructure Sure, you are familiar with real world scenarios such as End of Month Reporting, Intensive Load Process, Demanding Executive Dashboard Users, … 8
  • 9. © 2019 Cervello Inc. 5. Modern Data Warehouses: Examples 9 Reference: The Forrester Wave™: Cloud Data Warehouse, Q4 2018, October 29, 2018 The 14 Providers That Matter Most And How They Stack Up
  • 10. © 2019 Cervello Inc. 5. Modern Data Warehouses: Examples 10
  • 11. © 2019 Cervello Inc. 5. Modern Data Warehouses: Solutions § Major modern data warehouses are cloud based. Most of them do overcome the key challenges of the traditional data warehouses. Unfortunately, overcoming these challenges is happening at different degrees and it is not always a simple check of a yes or no!! 1. Scalability: Scale up, down, or off quickly without delay. Yes for Snowflake. Low for Redshift, Yes for Azure Data Warehouse, Yes for Big Query ( but with limits) 2. Concurrency: Lots of users can operate simultaneously 3. Performance: processing bottlenecks and delays, slow running queries, … 4. Resilience: data backup/retention and node failure protection 5. Complexity: Cloud data warehouses are easier to use than on-premise data warehouses 6. Lack of native support of semi-structured data: Although cloud data warehouse do support semi-structured data such as JSON. This support varies from one data warehouse to another. −Concurrent throughput for JSON: Yes for Snowflake, No for Redshift, No for Azure Data Warehouse, Moderate for Big Query. −High performance for JSON scans: Yes for Snowflake, No for Redshift unless you use the additional Amazon Spectrum tool, No for Azure Data Warehouse, Moderate for Big Query. 11
  • 12. © 2019 Cervello Inc. 5. Modern Data Warehouses: Solutions 7. High maintenance overhead is not a major issue with managed infrastructure for cloud data warehouses, but the level of maintenance varies from one data warehouse to another. This fluctuates to creating indices and distribution keys to near zero management 8. Handling workload fluctuation: With elasticity, cloud data warehouses adapt to workload peaks and valleys 9. Waste of resources: No more wasted resources during off peak usage! Auto suspend, Auto resume? Anybody knows about these features from Snowflake data warehouse? 10. Lack of support for streaming data: For example, loading and processing is possible with Snowpipe, a serverless compute service from Snowflake data warehouse 11. Upfront costs: No upfront costs as the model of cloud data warehouse is pay as you go 12. Project Delays due to provisioning infrastructure: Not Applicable for cloud data warehouse. With instant provisioning, projects don’t need to wait for infrastructure 12
  • 13. © 2019 Cervello Inc. 5. Modern Data Warehouses: Snowflake key innovations 13 § Snowflake is based on three key innovations: 1. Unique architecture : a unique architecture designed for the cloud and able to provide complete elasticity for all your concurrent users and applications 2. Database engine that natively handles all your data both semi structured and structured data without sacrificing performance nor flexibility 3. Technology that eliminates the need for manual data warehouse management and tuning. No indexing, tuning, partitioning or vacuuming after loading data => effortless management § Snowflake architecture consists of three layers, each one is physically decoupled from the other layer and scales independently: 1. Data Storage layer uses cloud storage to store all data loaded into snowflake in a scalable and inexpensive way 2. Compute layer comprises of virtual warehouses compute resources that execute data processing tasks required for queries. The virtual warehouses have access to all of the data in the storage layer 3. Cloud Services layer coordinates the entire system managing security, optimization and metadata
  • 14. © 2019 Cervello Inc. 5. Modern Data Warehouses: Snowflake’s multi-cluster, shared data architecture 14
  • 15. © 2019 Cervello Inc. 6. Uses Cases + Live Demo § Data as a Service § Self-Service Analytics § Advanced Analytics Lab § Real-Time Insight § Single View of Customer 15 Data Consumers § Partners, Suppliers, and Vendors § Executives, Managers, and Analysts § Data Scientists § Embedded Applications § Many Business Users New Use Cases
  • 16. © 2019 Cervello Inc. 6. Use Cases + Live Demo: Snowflake reference architecture 16 Data Sources/Data Providers: On-premise or Cloud, Structured or Unstructured Data Dataflow Tools: Streaming Data Platforms, ETL and ELT platforms. E: Extract, L: Load, T: Transform Data Storage: S3/Azure Storage (missing in the architecture diagram!) Analytics Services: BI, UI, Data Science, …
  • 17. © 2019 Cervello Inc. 6. Use Case + Live Demo: Live Tracking of CTA Buses 17 § The purpose of this live demo is to showcase a few aspects of a Snowflake such as Ease of use, Fast provisioning, Native support of semi-structured data, Continuous data loading , …Due to time constraints, we won’t cover other unique features of Snowflake such as multi- warehouse concurrency against same data, data sharing, time travel … § This live demo itself is about an end-to-end application for live tracking of CTA buses that: 1. Sends a request every minute to CTA Bus Tracker web service about live bus locations and gets an immediate response as JSON data load 2. Stores the JSON data into Amazon S3 and trigger a notification event into an SQS queue 3. Consumes the event from the SQS using Snowpipe, a serverless compute service from Snowflake. Snowpipe automatically loads JSON data, as is with no ETL, from S3 into a Snowflake SQL database 4. Queries the JSON data using Snowflake’s flatten method as extension to standard ANSI SQL 5. Visualizes the buses locations on a map § This is not a toy use case! Since JSON is a popular format for data exchange, the same data pipeline can be leveraged in many other use cases with different data sources: internal applications, external enterprise data services, machine generated data such as IoT devices …
  • 18. © 2019 Cervello Inc. 6. Use Case + Live Demo: Live Tracking of CTA Buses 18 § What did I use to cook this application? • AWS Free tier: a free account from https://aws.amazon.com/free/ • Snowflake Test Drive: 30 days free trial through our partnership with Snowflake. Who wants one? https://trial.snowflake.com/?utm_source=cervello&utm_medium=referral&utm_campaign=sel f-service-partner-referral-cervello • CTA (Chicago Transit Authority) Bus Tracker API to get free up-to-the-minute JSON data feeds: https://www.transitchicago.com/developers/bustracker/ You just need to request a free API key from CTA • Anaconda: Free and open source distribution of Python and R programming language. Comes with Python IDE (Spyder 3.3.2) and other goodies! • Free trial of Tableau Online: https://www.tableau.com/products/cloud-bi
  • 19. © 2019 Cervello Inc. 6. Use Cases + Live Demo: Live Tracking of CTA buses 19
  • 20. © 2019 Cervello Inc. 7. Key Takeaways & Where To Go From Here? §Try it yourself −Snowflake is worth a Free trial! https://trial.snowflake.com/?utm_source=cervello&utm_medium=referral&utm_campaign=s elf-service-partner-referral-cervello −You can build a short term POC in either AWS or Azure while using US $400 credit for 30 days for both compute and storage, §Engage the business −It is critical to identify business sponsorship and use cases −Cervello prefers short, high impact workshop approach §Get end to end help −Pick the right data platform … For example, the most ‘popular’ data warehouse is not necessarily the one that fulfills your data warehouse needs! We helped many of our clients move from Amazon Redshift and other traditional data warehouses to Snowflake. −Avoid having a pure ‘technology refresh’ and missing the opportunities to add business value (refer to use cases) §Get advice 20
  • 21. © 2019 Cervello Inc. 8. Interactive Session: Q&A, discussion and more networking § The goal here is for the audience to chime in now and later on the comments section of the event or the meetup discussions section. § Here is a few samples of food for thought! • Can somebody briefly share his/her real-world experience migrating from a legacy data warehouse to a modern data warehouse? • Did anybody worked on a fit-for-purpose analysis of modern data warehouses? • What would be the best practices of using a modern data warehouse such as Snowflake? • What advice will you give someone transitioning his/her career to work on a modern data warehouse? • Is there anyone who would like to give a follow up talk on modern data warehousing? 21
  • 22. © 2019 Cervello Inc. Boston | New York | Dallas | London Get in touch with contributors: Thank You! Learn more about Cervello at mycervello.com Slim Baltagi sbaltagi@gmail.com https://www.linkedin.com/in/slimbaltagi/ @SlimBaltagi 22 Jim Leavitt jleavitt@mycervello.com https://www.linkedin.com/in/jimleavitt/
  • 23. © 2019 Cervello Inc. Appendix 23
  • 24. © 2019 Cervello Inc. Cervello Profile We are a professional services firm focused on helping organizations win with data. We have a global presence with an ability to service customers across industries. We are organized around three practices that our under pinned by our expertise in modern data architectures. PERFORMANCE MANAGEMENT SUPPLIER + CUSTOMER RELATIONSHIPS DATA MONETIZATION + PRODUCTS DATA MANAGEMENT & MACHINE LEARNING We embed data management processes and use machine learning to improve data quality Using leading cloud technology platforms we develop and design analytics-ready connected data solutions MODERN DATA ARCHITECTURE & PLATFORMS ANALYTICS & DATA SCIENCE MODELING & PLANNING MOBILE EXPERIENCE We provide different methods to consume data to facilitate insights and actions inside and outside the organization BUSINESS DRIVEN INSIGHTS & ACTIONS HOWWEDELIVER WHATWEDELIVER STRATEGY BUILD SERVICES RUN & COE SERVICES Our teams are located in Boston, New York, Dallas, London and Bangalore
  • 25. © 2019 Cervello Inc. 25 TRAIN 2 DAYS Getting around Snowflake Analyzing & Reporting Data Data Management & Security PLAN < 1 WEEK Use case inventory Readiness Assessment Architecture Design ACTION 1-2 WEEKS Use case & MVP defined POC Implementation Stakeholder Presentation TCO Analysis Cervello Value Accelerator 9 Years Working In The Cloud 30 Snowflake Trained Consultants 100+ Big Data & Analytics Engagements End to End Enablement