In this talk we present Zalando's microservices architecture, introduce Saiki – our next generation data integration and distribution platform on AWS and show how we employ stream processing for near-real time business intelligence.
Zalando is one of the largest online fashion retailers in Europe. In order to secure our future growth and remain competitive in this dynamic market, we are transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization.
We first have a look at how business intelligence processes have been working inside Zalando for the last years and present our current approach - Saiki. It is a scalable, cloud-based data integration and distribution infrastructure that makes data from our many microservices readily available for analytical teams.
We no longer live in a world of static data sets, but are instead confronted with an endless stream of events that constantly inform us about relevant happenings from all over the enterprise. The processing of these event streams enables us to do near-real time business intelligence. In this context we have evaluated Apache Flink vs. Apache Spark in order to choose the right stream processing framework. Given our requirements, we decided to use Flink as part of our technology stack, alongside with Kafka and Elasticsearch.
With these technologies we are currently working on two use cases: a near real-time business process monitoring solution and streaming ETL.
Monitoring our business processes enables us to check if technically the Zalando platform works. It also helps us analyze data streams on the fly, e.g. order velocities, delivery velocities and to control service level agreements.
On the other hand, streaming ETL is used to relinquish resources from our relational data warehouse, as it struggles with increasingly high loads. In addition to that, it also reduces the latency and facilitates the platform scalability.
Finally, we have an outlook on our future use cases, e.g. near-real time sales and price monitoring. Another aspect to be addressed is to lower the entry barrier of stream processing for our colleagues coming from a relational database background.
2. 2
AGENDA
● Zalando’s Microservices Architecture
● Saiki - Data Integration and Distribution at Scale
● Flink in a Microservices World
● Stream Processing Use Cases:
o Business Process Monitoring
o Continuous ETL
● Future Work
3. 3
ABOUT US
Mihail Vieru
Big Data Engineer,
Business Intelligence
Javier López
Big Data Engineer,
Business Intelligence
5. 5
One of Europe's largest online fashion retailers
15 countries
~19 million active customers
~3 billion € revenue 2015
1,500 brands
150,000+ products
11,000+ employees in Europe
8. 8
VINTAGE BUSINESS INTELLIGENCE
Classical ETL process
Business
Logic
Data Warehouse (DWH)
DatabaseDBA
BI
Business
Logic
Database
Business
Logic
Database
Business
Logic
Database
Dev
19. 19
SAIKI — DATA INTEGRATION & DISTRIBUTION
BI
Data Warehouse E.g. Forecast DB
SAIKI
App A App B App DApp C
Exporter
REST API
Stream Processing
via Apache Flink Data Lake .
AWS S3
20. 20
SAIKI — SUMMARY
Δ J
D
B
C
REST
B
E
F
O
R
E
A
F
T
E
R
Data sources
Technologies
Data sources
Connections
Data sources
Extraction
Data
Delivery
22. 22
OPPORTUNITIES FOR NEXT GEN BI
Cloud Computing
- Distributed ETL
- Scale
Access to Real Time Data
- All teams publish data to central event
bus
Hub for Data Teams
- Data Lake provides distributed access
and fine grained security
- Data can be transformed (aggregated,
joined, etc.) before delivering it to data
teams
Semi-Structured Data
“General-purpose data processing engines
like Flink or Spark let you define own data
types and functions.”
- Fabian Hueske, dataArtisans
24. 24
THE RIGHT FIT — STREAM PROCESSING ENGINE
Candidates:
Storm & Samza ruled out because of batch processing
requirement
25. 25
SPARK VS. FLINK DIFFERENCES
Feature Apache Spark 1.5.2 Apache Flink 0.10.1
Processing mode micro-batching tuple at a time
Temporal processing support processing time event time, ingestion time,
processing time
Latency seconds sub-second
Back pressure handling manual configuration implicit, through system
architecture
State access full state scan for each microbatch value lookup by key
Operator library neutral ++ (split, windowByCount..)
Support neutral ++ (mailing list, direct contact &
support from data Artisans)
26. 26
SPARK VS. FLINK PERFORMANCE
source: Benchmarking Streaming Computation Engines at Yahoo!
27. 27
APACHE FLINK
• true stream processing framework
• process events at a consistently high rate with low
latency
• scalable
• great community and on-site support from Berlin/
Europe
• university graduates with Flink skills
https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/
31. 31
BUSINESS PROCESS
A business process is in its simplest form a chain of
correlated events:
start event completion event
ORDER_CREATED ALL_PARCELS_SHIPPED
Business Events from the whole Zalando platform flow through
Saiki => opportunity to process those streams in near real time
32. 32
REAL-TIME BUSINESS PROCESS MONITORING
• Check if business processes in the Zalando platform work
• Analyze data on the fly:
o Order velocities
o Delivery velocities
o Control SLAs of correlated events, e.g. parcel sent out
after order
33. 33
Saiki BPM
ARCHITECTURE BPM
Cfg Service
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log
PUBLICINTERNET
OAUTH
Alert Svc
UI
Elasticsearch
Stream Processing
34. 34
HOW WE USE FLINK IN BPM
• 1000+ Event Types; 1 Event Type -> 1 Kafka topic
• Analyze processes with correlated event types (Join &
Union)
• Enrich data based on business rules
• Sliding Windows (1min to 48hrs) for Platform Snapshots
• State for alert metadata
• Generation and processing of Complex Events (CEP lib)
36. 36
Extract Transform Load (ETL)
Traditional ETL process:
• Batch processing
• No real time
• ETL tools
• Heavy processing on the storage side
37. 37
WHAT CHANGED WITH RADICAL AGILITY?
• Data comes in a semi-structured format (JSON payload)
• Data is distributed in separate Kafka topics
• There would be peak times, meaning that the data flow
will increase by several factors
• Data sources number increased by several factors
38. 38
`
Saiki Streaming ETL
ARCHITECTURE STREAMING ETL
Stream Processing
App A App B
Nakadi Event Bus
App C
Operational Systems
Kafka2Kafka
Unified Log Exporter
Oracle DWH
Importer
39. 39
HOW WE (WOULD) USE FLINK IN STREAMING ETL
• Transformation of complex payloads into simple ones for
easier consumption in Oracle DWH
• Combine several topics based on Business Rules (Union,
Join)
• Pre-Aggregate data to improve performance in the
generation of reports (Windows, State)
• Data cleansing
• Data validation
41. 41
COMPLEX EVENT PROCESSING FOR BPM
Cont. example business process:
• Multiple PARCEL_SHIPPED events per order
• Generate complex event ALL_PARCELS_SHIPPED,
when all PARCEL_SHIPPED events received
(CEP lib, State)
42. 42
DEPLOYMENTS FROM OTHER BI TEAMS
Flink Jobs from other BI Teams
Requirements:
• manage and control deployments
• isolation of data flows
o prevent different jobs from writing to the same sink
• resource management in Flink
o share cluster resources among concurrently running jobs
StreamSQL would significantly lower the entry barrier
43. 43
OTHER FUTURE TOPICS
• New use cases for Real Time Analytics/ BI
o Sales monitoring
o Price monitoring
• Fraud detection for payments (evaluation)
• Contact customer according to variable event pattern
(evaluation)
44. 44
CONCLUSION
Flink proved to be the right fit for our current stream
processing use cases. It enables us to build Zalando’s Next
Gen BI platform.
https://tech.zalando.de/blog/?tags=Saiki