SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
Data Architecture at VEX
Talk Of The Minds
Wout Scheepers
Vente-Exclusive.com: Market leader in the Benelux
2,7M
54% of
members
2,3M
45% of
members
40K
1% of
members
The Benelux Market =
29M people with high purchase power
> 6 M members
in the Benelux
Up to
230 000 unique
visitors per day
> 2 500
partner brands in
Europe
Founded in
2007
> 300
staff in Brussels
& Amsterdam
€ 126 M
turnover in 2016*
+ 54% vs. 2015
* NET turnover: VAT excluded, after forced cancellations, user cancellations, discounts,
shipping fees and returns excluded
Key figures 2016
Meet the IT team
5 Squads (~50 people)
Customer facing shop front- and backend
Logistics warehouse software, deliveries
ESPN Backoffice for employees to configure shop,
manage sales, customer-care, …
Operations company wide IT-support, shop issues
Data Business intelligence
● Provide business with valuable information for
decision making (KPI’s & dashboards)
● Provide analysts with uniform query-able data
(data-warehouse)
● Relevance (recommendation, sale ranking,
competitor pricing...)
Meet the data team
1 product manager
3 data engineers
2 data scientists
1 tableau expert
Our responsibilities
v
A major growth supported by strategic alliances
FRANCE
SPAIN
ITALY
UK
SPAIN
ITALY
SWITZERLAND
POLAND
BELGIUM
NETHERLANDS
LUXEMBOURG
GERMANY
AUSTRIA
DENMARK
Geographic expansion to Germany, Austria & Scandinavia
COPENHAGEN
We will need to scale our multichannel e-commerce platform
We will need to scale our multichannel e-commerce platform
More customers & sales horizontal
More geographic locations geographical
https://cloud.google.com/kubernetes-engine/kubernetes-comic/
Scaling the business
Monolith architecture
● One large application
● Single production database
● Dedicated machines
Scaling the business
Monolith architecture
● One large application
● Single production database
● Dedicated machines
Drawbacks
● Integration nightmares (hope all parts keep working together)
● Deployment nightmares (hope the platform does not go down)
Scaling the business
Scaling the monolith...
Brussels
Brussels Amsterdam
Horizontal Geographical
Leads to...
● Inconsistencies
● Inefficient resource allocation
… while we already had BI challenges to fix
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
uniform data warehouse
Consistency
… while we already had BI challenges to fix
2Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
uniform data warehouse precalculated KPI’s
Consistency
… while we already had BI challenges to fix
Efficiency
2
3
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
● Redundant recalculations
● Redeveloping queries
→ Waste of human and computing resources
uniform data warehouse precalculated KPI’s
precalculated KPI’s
Consistency
… while we already had BI challenges to fix
Efficiency
2
Availability43
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
● Redundant recalculations
● Redeveloping queries
→ Waste of human and computing resources
Increasing use cases for real-time data
→ Hard to deliver without affecting system
performance
uniform data warehouse precalculated KPI’s
precalculated KPI’s streaming KPI pipelines
Microservice architecture
Scaling the business
Monolith architecture
Our solution: microservices
• Small, modular service
• Unique process that
serves a business goal
• Independently deployable
Production
database
MongoDB
Database
Cloud SQL
Database
Scaling the business: microservices
Microservice challenges
• Management overhead
• Need well defined
communication between services
+ Big challenge for Business Intelligence
• Need to collect and merge data from multiple sources
• NoSQL databases are not suitable for analytical queries
Original platform architecture
● Monolithic .Net application
● Single production database
● Dedicated machines
Production
database
.Net
Application
Original platform architecture
Production
database
Reporting
database
SQL
Microsoft
Excel
● Monolithic .Net application
● Single production database
● Dedicated machines
● Data copied to reporting server nightly
● Most analysis in SQL & Excel
.Net
Application
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
.Net
Application
Nightly table
transfers
Cloud
Storage
Apache Airflow
Batch Orchestration
Open source, developed at Airbnb
Extract Transfer Load (ETL)
DAGs to define sequence of tasks
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
.Net
Application
Nightly table
transfers
Cloud
Storage
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
Channel
interactions
.Net
Application
Nightly table
transfers
Cloud
Storage
Google BigQuery
● Analytics data warehouse
● zero configuration
No worries about memory, network,
CPU or disk
● Petabyte scale
● Vex: ~16TB
Queried 1 month: ~700TB
Google BigQuery
● Based on Google Dremel
● Parallel query execution:
1. Columnar Storage
→ high compression ratio and scan
throughput
2. Tree Architecture
→ dispatching queries and aggregating
results across thousands of machines
Hope you are not easily impressed
How long it would take to read 80GB from a hard drive at
100 MB/s?
~ 80 000 / 100 = 800s = 13.33 min
What if we use an SSD (700 MB/s)?
~ 80 000 / 700 = 114s = 2 min
Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
Channel
interactions
.Net
Application
Nightly table
transfers
Cloud
Storage
Business intelligence managed
Production
database
Reporting
database
BigQuery
Nightly table
transfers
SparkPost
ADYEN
E-mail
Payments
Channel
interactions
.Net
Application
Cloud
Storage
Business intelligence managed using Tableau
Production
database
Reporting
database
Nightly table
transfers
.Net
Application
Tableau
Server
Microsoft
Excel
Cloud
Storage
BigQuery
SparkPost
ADYEN
E-mail
Payments
Channel
interactions
Personalization using PySpark on DataProc, and BigQuery
Production
database
Reporting
database
BigQuery
Nightly transfers
Cloud
Storage
Cloud
Dataproc
.Net
Applications
Tableau
Server
Microsoft
Excel
Relevance calculations
E.g. sale-ranking
BigQuery
SparkPost
ADYEN
E-mail
Payments
Channel
interactions
Why the cloud?
We use Google Cloud Platform (PaaS)
Why the cloud?
We use Google Cloud Platform (PaaS)
Managed products
Managed infrastructure
Focus on solving the application
challenges at hand
With state-of-art the developer products
that integrate well
Without worrying about infrastructure
Main advantages Enable us to
Also, we only pay for the resources we use!
Why the cloud?
We use Google Cloud Platform (PaaS)
Managed products
Managed infrastructure
Focus on solving the application
challenges at hand
With state-of-art the developer products
that integrate well
Without worrying about infrastructure
Main advantages Enable us to
Also, we only pay for the resources we use!
Disadvantage: we depend on google...
New microservice architecture
● Monolithic application is
decomposed into microservices
● New architecture allows to quickly
scale horizontally and
geographically
● Team standardized on .Net Core,
Angular 2 and (mostly) MongoDB
● Each service has its own (No)SQL
database
Kubernetes
Production microservices
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Containers?
a lightweight, stand-alone,
executable package
...
of a piece of software
...
that includes everything needed to
run it: code, runtime, system tools,
system libraries, settings
Kubernetes?
open-source system for automating deployment, scaling, and management
of containerized applications
Pokémon go! launch
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0 P1.1
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0 P1.1
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.0 P1.1
CI/CD
Continuous integration/continuous deployment
→ New features can be developed and deployed independently
E.g. payment microservice rolling update
P1.1
Microservices pros/cons
+ Scaling
+ CI/CD
+ If something breaks, fix it in one place
- Container management and deployment: bumpy road
- Communication between services → contracts
- Business intelligence: data collection + aggregation
Back-Office Sale Progress
One back-office screen requires information from multiple services
Product catalog Pricing Stock Orders Clicks
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Event Sourcing with PubSub
Message-oriented middleware
Many-to-many, asynchronous
• Data is published onto a
topic
• Data can be pulled through
a subscription on this topic
Open source alternative: Kafka
Membership
microservice
Messaging
microservice
“member-
created”
Sends welcome email
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Entity builder
DataFlow streaming
BigQuery
Google Dataflow / Apache Beam
Unified model for streaming and batch pipelines
for processing large datasets
Focus on logical composition instead of physical orchestration
→ focus on what instead of how
Useful abstractions: distribution, coordinating workers, data
sharding, ...
Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Entity builder
DataFlow streaming
BigQuery
Google BigTable
● Massively Scalable NoSQL
● Key value store
● 3 dimensions: rows, columns, time
● Simultaneously read and write
● Large throughput, minimal latency
Example BigTable schema
member_id auth profile ...member channel ip address ...
member@20170602 20:30
member@20170604 08:26
member@20171207 12:17
member@20171014 14:57
Example BigTable schema
member_id auth profile ...member channel ip address ...
member@20170602 20:30
member@20170604 08:26
member@20171207 12:17
member@20171014 14:57
Why BigTable?
● Fast lookups & writes
→ essential for our real-time pipelines!
● Bonus points: store complete history
What is the main difference with BigQuery?
Kubernetes
Production microservices
Data is looped back through microservices
Cloud
Bigtable
Data egress
Python / GO
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Entity builder
DataFlow streaming
Kubernetes
Production microservices
Large data streams are stored in BigQuery
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Channel
interactions
Data egress
Python / GO
Data ingress
Python / GO
SparkPost
ADYEN
External event
Cloud Pub/Sub
Payments
E-mail
Entity builder
DataFlow streaming
BigQuery
Cloud
Dataflow
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
How to get from production data to analyst data?
Production data
(2) Analyst data
MongoDB
Database Cloud SQL
Name
BigQuery
BI infrastructure
Microservices
External data
SparkPost
ADYEN
Calculations
BigQuery
Entities
BigQuery
Raw entity data
Processed data
Cloud
Bigtable
(1) Production data
Cloud
Bigtable
2
1
Example: Real-time data enrichment
Cloud
Pub/Sub
Cloud
Bigtable
Cloud
Dataflow
Entity information
Example: unique visitors per country
Cloud
Bigtable
Demo
time
60
Key take-aways
Cloud enables us to do a lot in a short amount of time
Microservices have trade-offs.
For us, scaling is worth it.
Good tooling is very important.
Also make your own tools that are business specific.
Interesting references
● Inside look at Google Bigquery
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
● Comic: CI/CD with kubernetes
https://cloud.google.com/kubernetes-engine/kubernetes-comic/
● The Children's Illustrated Guide to Kubernetes
https://deis.com/blog/2016/kubernetes-illustrated-guide/
● Netflix microservice architecture
https://www.youtube.com/watch?v=57UK46qfBLY
● Streaming pipelines with Google Dataflow
https://youtu.be/JZPTQrNKsqI
63
wout.scheepers@exellys.com

Más contenido relacionado

La actualidad más candente

Playing to Win: Turbocharged Tableau with a GPU Database
Playing to Win: Turbocharged Tableau with a GPU DatabasePlaying to Win: Turbocharged Tableau with a GPU Database
Playing to Win: Turbocharged Tableau with a GPU DatabaseKinetica
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverInside Analysis
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinTyler Wishnoff
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Big Data Spain
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureBig Data Spain
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020Nathan Skousen
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeTorsten Steinbach
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentKinetica
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsOperationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsKinetica
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Harald Erb
 
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...Kinetica
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeAtScale
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 

La actualidad más candente (20)

Playing to Win: Turbocharged Tableau with a GPU Database
Playing to Win: Turbocharged Tableau with a GPU DatabasePlaying to Win: Turbocharged Tableau with a GPU Database
Playing to Win: Turbocharged Tableau with a GPU Database
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing ForeverSeeing Redshift: How Amazon Changed Data Warehousing Forever
Seeing Redshift: How Amazon Changed Data Warehousing Forever
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020SLC Snowflake User Group - Mar 12, 2020
SLC Snowflake User Group - Mar 12, 2020
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database AnalyticsOperationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
Operationalizing Machine Learning Using GPU Accelerated, In-Database Analytics
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
GTC-DC 2017 Session: Advanced Analytics and Machine Learning with Geospatial ...
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 

Similar a Data Architecture at Vente-Exclusive.com - TOTM Exellys

Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsAlluxio, Inc.
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azuregjuljo
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionDmitry Anoshin
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analyticsMariaDB plc
 
How to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsHow to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsAlluxio, Inc.
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...Looker
 

Similar a Data Architecture at Vente-Exclusive.com - TOTM Exellys (20)

Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Applying linear regression and predictive analytics
Applying linear regression and predictive analyticsApplying linear regression and predictive analytics
Applying linear regression and predictive analytics
 
How to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and ApplicationsHow to Develop and Operate Cloud Native Data Platforms and Applications
How to Develop and Operate Cloud Native Data Platforms and Applications
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Data Architecture at Vente-Exclusive.com - TOTM Exellys

  • 1. Data Architecture at VEX Talk Of The Minds Wout Scheepers
  • 2. Vente-Exclusive.com: Market leader in the Benelux 2,7M 54% of members 2,3M 45% of members 40K 1% of members The Benelux Market = 29M people with high purchase power > 6 M members in the Benelux Up to 230 000 unique visitors per day > 2 500 partner brands in Europe Founded in 2007 > 300 staff in Brussels & Amsterdam € 126 M turnover in 2016* + 54% vs. 2015 * NET turnover: VAT excluded, after forced cancellations, user cancellations, discounts, shipping fees and returns excluded Key figures 2016
  • 3. Meet the IT team 5 Squads (~50 people) Customer facing shop front- and backend Logistics warehouse software, deliveries ESPN Backoffice for employees to configure shop, manage sales, customer-care, … Operations company wide IT-support, shop issues Data Business intelligence
  • 4. ● Provide business with valuable information for decision making (KPI’s & dashboards) ● Provide analysts with uniform query-able data (data-warehouse) ● Relevance (recommendation, sale ranking, competitor pricing...) Meet the data team 1 product manager 3 data engineers 2 data scientists 1 tableau expert Our responsibilities
  • 5. v A major growth supported by strategic alliances FRANCE SPAIN ITALY UK SPAIN ITALY SWITZERLAND POLAND BELGIUM NETHERLANDS LUXEMBOURG GERMANY AUSTRIA DENMARK Geographic expansion to Germany, Austria & Scandinavia COPENHAGEN
  • 6. We will need to scale our multichannel e-commerce platform
  • 7. We will need to scale our multichannel e-commerce platform More customers & sales horizontal More geographic locations geographical
  • 9. Scaling the business Monolith architecture ● One large application ● Single production database ● Dedicated machines
  • 10. Scaling the business Monolith architecture ● One large application ● Single production database ● Dedicated machines Drawbacks ● Integration nightmares (hope all parts keep working together) ● Deployment nightmares (hope the platform does not go down)
  • 11. Scaling the business Scaling the monolith... Brussels Brussels Amsterdam Horizontal Geographical Leads to... ● Inconsistencies ● Inefficient resource allocation
  • 12. … while we already had BI challenges to fix Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data uniform data warehouse
  • 13. Consistency … while we already had BI challenges to fix 2Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data ● No single definition of KPIs ● Analysts write different calculations from different data sources → Inconsistencies uniform data warehouse precalculated KPI’s
  • 14. Consistency … while we already had BI challenges to fix Efficiency 2 3 Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data ● No single definition of KPIs ● Analysts write different calculations from different data sources → Inconsistencies ● Redundant recalculations ● Redeveloping queries → Waste of human and computing resources uniform data warehouse precalculated KPI’s precalculated KPI’s
  • 15. Consistency … while we already had BI challenges to fix Efficiency 2 Availability43 Discovery1 Reporting and production data mixed in single database → Hard for analyst to find the right reporting data ● No single definition of KPIs ● Analysts write different calculations from different data sources → Inconsistencies ● Redundant recalculations ● Redeveloping queries → Waste of human and computing resources Increasing use cases for real-time data → Hard to deliver without affecting system performance uniform data warehouse precalculated KPI’s precalculated KPI’s streaming KPI pipelines
  • 16. Microservice architecture Scaling the business Monolith architecture Our solution: microservices • Small, modular service • Unique process that serves a business goal • Independently deployable
  • 17. Production database MongoDB Database Cloud SQL Database Scaling the business: microservices Microservice challenges • Management overhead • Need well defined communication between services + Big challenge for Business Intelligence • Need to collect and merge data from multiple sources • NoSQL databases are not suitable for analytical queries
  • 18. Original platform architecture ● Monolithic .Net application ● Single production database ● Dedicated machines Production database .Net Application
  • 19. Original platform architecture Production database Reporting database SQL Microsoft Excel ● Monolithic .Net application ● Single production database ● Dedicated machines ● Data copied to reporting server nightly ● Most analysis in SQL & Excel .Net Application
  • 20. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers .Net Application Nightly table transfers Cloud Storage
  • 21. Apache Airflow Batch Orchestration Open source, developed at Airbnb Extract Transfer Load (ETL) DAGs to define sequence of tasks
  • 22.
  • 23. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers .Net Application Nightly table transfers Cloud Storage
  • 24. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers Channel interactions .Net Application Nightly table transfers Cloud Storage
  • 25. Google BigQuery ● Analytics data warehouse ● zero configuration No worries about memory, network, CPU or disk ● Petabyte scale ● Vex: ~16TB Queried 1 month: ~700TB
  • 26. Google BigQuery ● Based on Google Dremel ● Parallel query execution: 1. Columnar Storage → high compression ratio and scan throughput 2. Tree Architecture → dispatching queries and aggregating results across thousands of machines
  • 27. Hope you are not easily impressed How long it would take to read 80GB from a hard drive at 100 MB/s? ~ 80 000 / 100 = 800s = 13.33 min What if we use an SSD (700 MB/s)? ~ 80 000 / 700 = 114s = 2 min
  • 28. Current architecture adopted GCP for large data sources Production database Reporting database BigQuery Nightly table transfers Channel interactions .Net Application Nightly table transfers Cloud Storage
  • 29. Business intelligence managed Production database Reporting database BigQuery Nightly table transfers SparkPost ADYEN E-mail Payments Channel interactions .Net Application Cloud Storage
  • 30. Business intelligence managed using Tableau Production database Reporting database Nightly table transfers .Net Application Tableau Server Microsoft Excel Cloud Storage BigQuery SparkPost ADYEN E-mail Payments Channel interactions
  • 31. Personalization using PySpark on DataProc, and BigQuery Production database Reporting database BigQuery Nightly transfers Cloud Storage Cloud Dataproc .Net Applications Tableau Server Microsoft Excel Relevance calculations E.g. sale-ranking BigQuery SparkPost ADYEN E-mail Payments Channel interactions
  • 32. Why the cloud? We use Google Cloud Platform (PaaS)
  • 33. Why the cloud? We use Google Cloud Platform (PaaS) Managed products Managed infrastructure Focus on solving the application challenges at hand With state-of-art the developer products that integrate well Without worrying about infrastructure Main advantages Enable us to Also, we only pay for the resources we use!
  • 34. Why the cloud? We use Google Cloud Platform (PaaS) Managed products Managed infrastructure Focus on solving the application challenges at hand With state-of-art the developer products that integrate well Without worrying about infrastructure Main advantages Enable us to Also, we only pay for the resources we use! Disadvantage: we depend on google...
  • 35. New microservice architecture ● Monolithic application is decomposed into microservices ● New architecture allows to quickly scale horizontally and geographically ● Team standardized on .Net Core, Angular 2 and (mostly) MongoDB ● Each service has its own (No)SQL database Kubernetes Production microservices Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core
  • 36. Containers? a lightweight, stand-alone, executable package ... of a piece of software ... that includes everything needed to run it: code, runtime, system tools, system libraries, settings
  • 37. Kubernetes? open-source system for automating deployment, scaling, and management of containerized applications Pokémon go! launch
  • 38. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0
  • 39. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0 P1.1
  • 40. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0 P1.1
  • 41. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.0 P1.1
  • 42. CI/CD Continuous integration/continuous deployment → New features can be developed and deployed independently E.g. payment microservice rolling update P1.1
  • 43. Microservices pros/cons + Scaling + CI/CD + If something breaks, fix it in one place - Container management and deployment: bumpy road - Communication between services → contracts - Business intelligence: data collection + aggregation
  • 44. Back-Office Sale Progress One back-office screen requires information from multiple services Product catalog Pricing Stock Orders Clicks
  • 45. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub
  • 46. Event Sourcing with PubSub Message-oriented middleware Many-to-many, asynchronous • Data is published onto a topic • Data can be pulled through a subscription on this topic Open source alternative: Kafka Membership microservice Messaging microservice “member- created” Sends welcome email
  • 47. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub
  • 48. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub Cloud Bigtable Entity builder DataFlow streaming BigQuery
  • 49. Google Dataflow / Apache Beam Unified model for streaming and batch pipelines for processing large datasets Focus on logical composition instead of physical orchestration → focus on what instead of how Useful abstractions: distribution, coordinating workers, data sharding, ...
  • 50. Kubernetes Production microservices Data collection using Event Sourcing Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub Cloud Bigtable Entity builder DataFlow streaming BigQuery
  • 51. Google BigTable ● Massively Scalable NoSQL ● Key value store ● 3 dimensions: rows, columns, time ● Simultaneously read and write ● Large throughput, minimal latency
  • 52. Example BigTable schema member_id auth profile ...member channel ip address ... member@20170602 20:30 member@20170604 08:26 member@20171207 12:17 member@20171014 14:57
  • 53. Example BigTable schema member_id auth profile ...member channel ip address ... member@20170602 20:30 member@20170604 08:26 member@20171207 12:17 member@20171014 14:57 Why BigTable? ● Fast lookups & writes → essential for our real-time pipelines! ● Bonus points: store complete history What is the main difference with BigQuery?
  • 54. Kubernetes Production microservices Data is looped back through microservices Cloud Bigtable Data egress Python / GO Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core Event sourcing Cloud Pub/Sub Entity builder DataFlow streaming
  • 55. Kubernetes Production microservices Large data streams are stored in BigQuery Event sourcing Cloud Pub/Sub Cloud Bigtable Channel interactions Data egress Python / GO Data ingress Python / GO SparkPost ADYEN External event Cloud Pub/Sub Payments E-mail Entity builder DataFlow streaming BigQuery Cloud Dataflow Front-End Angular Databases MongoDB Back-Office Angular Back-End .Net Core
  • 56. How to get from production data to analyst data? Production data (2) Analyst data MongoDB Database Cloud SQL Name BigQuery BI infrastructure Microservices External data SparkPost ADYEN Calculations BigQuery Entities BigQuery Raw entity data Processed data Cloud Bigtable (1) Production data Cloud Bigtable 2 1
  • 57. Example: Real-time data enrichment Cloud Pub/Sub Cloud Bigtable Cloud Dataflow Entity information Example: unique visitors per country Cloud Bigtable
  • 59. Key take-aways Cloud enables us to do a lot in a short amount of time Microservices have trade-offs. For us, scaling is worth it. Good tooling is very important. Also make your own tools that are business specific.
  • 60. Interesting references ● Inside look at Google Bigquery https://cloud.google.com/files/BigQueryTechnicalWP.pdf ● Comic: CI/CD with kubernetes https://cloud.google.com/kubernetes-engine/kubernetes-comic/ ● The Children's Illustrated Guide to Kubernetes https://deis.com/blog/2016/kubernetes-illustrated-guide/ ● Netflix microservice architecture https://www.youtube.com/watch?v=57UK46qfBLY ● Streaming pipelines with Google Dataflow https://youtu.be/JZPTQrNKsqI