1. Traveloka’s Journey to
No Ops Streaming
Analytics
Rendy Bambang Jr., Data Eng Lead - Traveloka
Gaurav Anand, Solutions Engineer - Google
2. ● Business Intelligence
● Analytics
● Personalization
● Fraud Detection
● Ads optimization
● Cross selling
● AB Test
● etc.
How we use the data
3. 6 offices
Incl. Singapore
1,000+
Global employees
400+
Engineers
Our technology core has enabled
us to scale Traveloka into
6 countries
across ASEAN rapidly in
less than 2 years.
6. Consumer of Data
Initial Data Architecture
Streaming
Batch
Traveloka
App
Kafka
ETL
In Memory Real
Time DW
Data Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
NoSQL Realtime DB
Traveloka
Services
Hive, Presto
Query
DOMO
Analytics UI
7. Key Numbers
● Volume kafka: billions of messages/day
● In-Memory DB: hundreds of GB in-memory data
● NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases
● S3: hundreds of TB
● Spark: 20+ nodes, 200+ core
● Redshift DW: 20+ Nodes, tens of TB
● Team: 8 Developers + 3 SysOps/DevOps
8. Consumer of Data
Problems with Initial Data Architecture
Streaming
Batch
Traveloka
App
Kafka
ETL
In Memory Real
Time DW
Data Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
NoSQL Realtime DB
Traveloka
Services
Hive, Presto
Query
DOMO
Analytics UI
9. Problems with Initial Data Architecture
Debugging Kafka Issues - Dedicated
On-call
Data Warehouse throughput issues
for high frequency load, coupling
storage & compute
Team well being, paged on holiday,
even honeymoon for infra issue!
Scaling Issues with NoSQL DB and
In-Memory DB
Scaling Issues with Custom-built
Java Consumers
11. Ideal Solution
Fully-managed infrastructure to
free engineers to solve business
problems
Autoscaling of Storage and
Compute
Low end-to-end latency with
guaranteed SLA
Resilience, end-to-end system
availability
12. Solution Components
● Google Cloud PubSub (Events Data Ingestion)
● Google Cloud Dataflow (Stream Processing)
● Google Bigquery (Analytics)
● Cross-Cloud Environment (AWS-GCP)
● AWS DynamoDB (Operational datastore)
Note: Although Cloud Datastore was our prefered operational DB, but its non availability in SG region
necessitated use of Dynamodb.
14. Analytics Architecture: Reimagined
Consumer of Data
Streaming
Batch
Traveloka
App
Kafka
ETL
Data
Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
DOMO
Analytics UI
NoSQL DB
Traveloka
Services
Ingest
Cloud
Pub/Sub
Storage
Cloud
Storage
Pipelines
Cloud
Dataflow
Analytics
BigQuery
Monitoring
Logging
Hive, Presto
Query
15. Developed two Common Dataflow Engine
● Self-Service Streaming analytics to BigQuery
16. Developed two Common Dataflow Engine
● Stream processing to DynamoDB, common features for dev:
○ Combine by key
○ Optimistic Concurrency
○ Local-file based integration test
17. Key Facts/Numbers
● End to End Pipeline Latency: seconds
● Volume: hundreds of GB/day
● Team: 2 Developers, 0 Ops
● Agility: POC + Pilot in 1 month
● Migrate 50+ different stream processing use case in 1 month
● Bigquery Integration with BI tools: thousands of dashboard, hundreds of
users
26. Traveloka Data Team Philosophy
● Managed Service
● NoOps
● Self-Service
Focus more on solving complex business problems rather than focusing on
infrastructure
27. What required us to change?
● Ever increasing scale
● Ever increasing operations burden
● New business needs: Streaming Analytics
One of the most strategic parts of Traveloka's business is a streaming data processing pipeline that powers a number of use cases, including fraud detection, personalization, ads optimization, cross selling, A/B testing, and promotion eligibility.
In this talk, we’ll describe how Traveloka recently migrated this pipeline from a legacy architecture to a multi-cloud solution that includes the Google Cloud Platform (GCP) data analytics platform.
Highlight data engineer, highlight singapore office
Traveloka is a travel technology company based in Indonesia, Singapore, and India
its goal is to revolutionize human mobility
Traveloka vision
The purpose of my talk today is to give you a practical feedback regarding PubSub, DataFlow, BigQuery and more broadly our usage of Google Cloud Platform.
I'll start by briefly talking about Traveloka and what we do. Then I'll discuss the architecture we used for the past few years and the reasons why we decided to investigate new solutions. Finally, I'll present you what we put in place and the lessons learned along the way.
Volume kafka: billions of messages/day
In-Memory DB: hundreds of GB in-memory data
NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases
S3: hundreds of TB
Spark: 20+ nodes, 200+ core
Redshift DW: 20+ Nodes, tens of TB
Team: 8 Developers + 3 SysOps/DevOps
Volume kafka: billions of messages/day
In-Memory DB: hundreds of GB in-memory data
NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases
S3: hundreds of TB
Spark: 20+ nodes, 200+ core
Redshift DW: 20+ Nodes, tens of TB
Team: 8 Developers + 3 SysOps/DevOps
Track Session: As Traveloka grew over time, several problems emerged, including:
Track Session: We did our homework on technology that could support these requirements for our use case.
4minutes
Highlight component
Role and mapping
similar function from both sides, like bigquery
Hybrid, not the end state
Track Session: We did our homework on technology that could support these requirements for our use case.
Track Session: We did our homework on technology that could support these requirements for our use case.
Basic idea:
Run low-latency, weakly consistent streaming system
Alongside high-latency, strongly consistent batch system
And somehow merge their results together at the end.
This provided low-latency, correct results.
But at the cost of building, maintaining, and merging the results from two separate systems.
So what we set out to do with Apache Beam, was to provide a unified model...
One which could give you the features of both systems…
But even more than that, one which would allow you to tradeoff
the characteristics of each according to use case
So after you write your pipeline,
This is an approach we laid out in our 2013 VLDB paper on the Dataflow Model. And if you want to learn more in detail, that’s a good place to start.
...whether that’s Dataflow, Apache Spark, Apache Flink, Apache Apex, or any other runner we support.
And then for our part...