2. Vente-Exclusive.com: Market leader in the Benelux
2,7M
54% of
members
2,3M
45% of
members
40K
1% of
members
The Benelux Market =
29M people with high purchase power
> 6 M members
in the Benelux
Up to
230 000 unique
visitors per day
> 2 500
partner brands in
Europe
Founded in
2007
> 300
staff in Brussels
& Amsterdam
€ 126 M
turnover in 2016*
+ 54% vs. 2015
* NET turnover: VAT excluded, after forced cancellations, user cancellations, discounts,
shipping fees and returns excluded
Key figures 2016
3. Meet the IT team
5 Squads (~50 people)
Customer facing shop front- and backend
Logistics warehouse software, deliveries
ESPN Backoffice for employees to configure shop,
manage sales, customer-care, …
Operations company wide IT-support, shop issues
Data Business intelligence
4. ● Provide business with valuable information for
decision making (KPI’s & dashboards)
● Provide analysts with uniform query-able data
(data-warehouse)
● Relevance (recommendation, sale ranking,
competitor pricing...)
Meet the data team
1 product manager
3 data engineers
2 data scientists
1 tableau expert
Our responsibilities
5. v
A major growth supported by strategic alliances
FRANCE
SPAIN
ITALY
UK
SPAIN
ITALY
SWITZERLAND
POLAND
BELGIUM
NETHERLANDS
LUXEMBOURG
GERMANY
AUSTRIA
DENMARK
Geographic expansion to Germany, Austria & Scandinavia
COPENHAGEN
6. We will need to scale our multichannel e-commerce platform
7. We will need to scale our multichannel e-commerce platform
More customers & sales horizontal
More geographic locations geographical
10. Scaling the business
Monolith architecture
● One large application
● Single production database
● Dedicated machines
Drawbacks
● Integration nightmares (hope all parts keep working together)
● Deployment nightmares (hope the platform does not go down)
11. Scaling the business
Scaling the monolith...
Brussels
Brussels Amsterdam
Horizontal Geographical
Leads to...
● Inconsistencies
● Inefficient resource allocation
12. … while we already had BI challenges to fix
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
uniform data warehouse
13. Consistency
… while we already had BI challenges to fix
2Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
uniform data warehouse precalculated KPI’s
14. Consistency
… while we already had BI challenges to fix
Efficiency
2
3
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
● Redundant recalculations
● Redeveloping queries
→ Waste of human and computing resources
uniform data warehouse precalculated KPI’s
precalculated KPI’s
15. Consistency
… while we already had BI challenges to fix
Efficiency
2
Availability43
Discovery1
Reporting and production data mixed in
single database
→ Hard for analyst to find the right reporting
data
● No single definition of KPIs
● Analysts write different calculations
from different data sources
→ Inconsistencies
● Redundant recalculations
● Redeveloping queries
→ Waste of human and computing resources
Increasing use cases for real-time data
→ Hard to deliver without affecting system
performance
uniform data warehouse precalculated KPI’s
precalculated KPI’s streaming KPI pipelines
16. Microservice architecture
Scaling the business
Monolith architecture
Our solution: microservices
• Small, modular service
• Unique process that
serves a business goal
• Independently deployable
17. Production
database
MongoDB
Database
Cloud SQL
Database
Scaling the business: microservices
Microservice challenges
• Management overhead
• Need well defined
communication between services
+ Big challenge for Business Intelligence
• Need to collect and merge data from multiple sources
• NoSQL databases are not suitable for analytical queries
18. Original platform architecture
● Monolithic .Net application
● Single production database
● Dedicated machines
Production
database
.Net
Application
20. Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
.Net
Application
Nightly table
transfers
Cloud
Storage
23. Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
.Net
Application
Nightly table
transfers
Cloud
Storage
24. Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
Channel
interactions
.Net
Application
Nightly table
transfers
Cloud
Storage
25. Google BigQuery
● Analytics data warehouse
● zero configuration
No worries about memory, network,
CPU or disk
● Petabyte scale
● Vex: ~16TB
Queried 1 month: ~700TB
26. Google BigQuery
● Based on Google Dremel
● Parallel query execution:
1. Columnar Storage
→ high compression ratio and scan
throughput
2. Tree Architecture
→ dispatching queries and aggregating
results across thousands of machines
27. Hope you are not easily impressed
How long it would take to read 80GB from a hard drive at
100 MB/s?
~ 80 000 / 100 = 800s = 13.33 min
What if we use an SSD (700 MB/s)?
~ 80 000 / 700 = 114s = 2 min
28. Current architecture adopted GCP for large data sources
Production
database
Reporting
database
BigQuery
Nightly table
transfers
Channel
interactions
.Net
Application
Nightly table
transfers
Cloud
Storage
33. Why the cloud?
We use Google Cloud Platform (PaaS)
Managed products
Managed infrastructure
Focus on solving the application
challenges at hand
With state-of-art the developer products
that integrate well
Without worrying about infrastructure
Main advantages Enable us to
Also, we only pay for the resources we use!
34. Why the cloud?
We use Google Cloud Platform (PaaS)
Managed products
Managed infrastructure
Focus on solving the application
challenges at hand
With state-of-art the developer products
that integrate well
Without worrying about infrastructure
Main advantages Enable us to
Also, we only pay for the resources we use!
Disadvantage: we depend on google...
35. New microservice architecture
● Monolithic application is
decomposed into microservices
● New architecture allows to quickly
scale horizontally and
geographically
● Team standardized on .Net Core,
Angular 2 and (mostly) MongoDB
● Each service has its own (No)SQL
database
Kubernetes
Production microservices
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
43. Microservices pros/cons
+ Scaling
+ CI/CD
+ If something breaks, fix it in one place
- Container management and deployment: bumpy road
- Communication between services → contracts
- Business intelligence: data collection + aggregation
44. Back-Office Sale Progress
One back-office screen requires information from multiple services
Product catalog Pricing Stock Orders Clicks
46. Event Sourcing with PubSub
Message-oriented middleware
Many-to-many, asynchronous
• Data is published onto a
topic
• Data can be pulled through
a subscription on this topic
Open source alternative: Kafka
Membership
microservice
Messaging
microservice
“member-
created”
Sends welcome email
48. Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Entity builder
DataFlow streaming
BigQuery
49. Google Dataflow / Apache Beam
Unified model for streaming and batch pipelines
for processing large datasets
Focus on logical composition instead of physical orchestration
→ focus on what instead of how
Useful abstractions: distribution, coordinating workers, data
sharding, ...
50. Kubernetes
Production microservices
Data collection using Event Sourcing
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Entity builder
DataFlow streaming
BigQuery
51. Google BigTable
● Massively Scalable NoSQL
● Key value store
● 3 dimensions: rows, columns, time
● Simultaneously read and write
● Large throughput, minimal latency
52. Example BigTable schema
member_id auth profile ...member channel ip address ...
member@20170602 20:30
member@20170604 08:26
member@20171207 12:17
member@20171014 14:57
53. Example BigTable schema
member_id auth profile ...member channel ip address ...
member@20170602 20:30
member@20170604 08:26
member@20171207 12:17
member@20171014 14:57
Why BigTable?
● Fast lookups & writes
→ essential for our real-time pipelines!
● Bonus points: store complete history
What is the main difference with BigQuery?
54. Kubernetes
Production microservices
Data is looped back through microservices
Cloud
Bigtable
Data egress
Python / GO
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
Event sourcing
Cloud Pub/Sub
Entity builder
DataFlow streaming
55. Kubernetes
Production microservices
Large data streams are stored in BigQuery
Event sourcing
Cloud Pub/Sub
Cloud
Bigtable
Channel
interactions
Data egress
Python / GO
Data ingress
Python / GO
SparkPost
ADYEN
External event
Cloud Pub/Sub
Payments
E-mail
Entity builder
DataFlow streaming
BigQuery
Cloud
Dataflow
Front-End
Angular
Databases
MongoDB
Back-Office
Angular
Back-End
.Net Core
56. How to get from production data to analyst data?
Production data
(2) Analyst data
MongoDB
Database Cloud SQL
Name
BigQuery
BI infrastructure
Microservices
External data
SparkPost
ADYEN
Calculations
BigQuery
Entities
BigQuery
Raw entity data
Processed data
Cloud
Bigtable
(1) Production data
Cloud
Bigtable
2
1
57. Example: Real-time data enrichment
Cloud
Pub/Sub
Cloud
Bigtable
Cloud
Dataflow
Entity information
Example: unique visitors per country
Cloud
Bigtable
59. Key take-aways
Cloud enables us to do a lot in a short amount of time
Microservices have trade-offs.
For us, scaling is worth it.
Good tooling is very important.
Also make your own tools that are business specific.
60. Interesting references
● Inside look at Google Bigquery
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
● Comic: CI/CD with kubernetes
https://cloud.google.com/kubernetes-engine/kubernetes-comic/
● The Children's Illustrated Guide to Kubernetes
https://deis.com/blog/2016/kubernetes-illustrated-guide/
● Netflix microservice architecture
https://www.youtube.com/watch?v=57UK46qfBLY
● Streaming pipelines with Google Dataflow
https://youtu.be/JZPTQrNKsqI