SlideShare una empresa de Scribd logo
1 de 33
Traveloka’s Journey to
No Ops Streaming
Analytics
Rendy Bambang Jr., Data Eng Lead - Traveloka
Gaurav Anand, Solutions Engineer - Google
● Business Intelligence
● Analytics
● Personalization
● Fraud Detection
● Ads optimization
● Cross selling
● AB Test
● etc.
How we use the data
6 offices
Incl. Singapore
1,000+
Global employees
400+
Engineers
Our technology core has enabled
us to scale Traveloka into
6 countries
across ASEAN rapidly in
less than 2 years.
#EnablingMobility
In the beginning...
Consumer of Data
Initial Data Architecture
Streaming
Batch
Traveloka
App
Kafka
ETL
In Memory Real
Time DW
Data Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
NoSQL Realtime DB
Traveloka
Services
Hive, Presto
Query
DOMO
Analytics UI
Key Numbers
● Volume kafka: billions of messages/day
● In-Memory DB: hundreds of GB in-memory data
● NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases
● S3: hundreds of TB
● Spark: 20+ nodes, 200+ core
● Redshift DW: 20+ Nodes, tens of TB
● Team: 8 Developers + 3 SysOps/DevOps
Consumer of Data
Problems with Initial Data Architecture
Streaming
Batch
Traveloka
App
Kafka
ETL
In Memory Real
Time DW
Data Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
NoSQL Realtime DB
Traveloka
Services
Hive, Presto
Query
DOMO
Analytics UI
Problems with Initial Data Architecture
Debugging Kafka Issues - Dedicated
On-call
Data Warehouse throughput issues
for high frequency load, coupling
storage & compute
Team well being, paged on holiday,
even honeymoon for infra issue!
Scaling Issues with NoSQL DB and
In-Memory DB
Scaling Issues with Custom-built
Java Consumers
How do we..?
Ideal Solution
Fully-managed infrastructure to
free engineers to solve business
problems
Autoscaling of Storage and
Compute
Low end-to-end latency with
guaranteed SLA
Resilience, end-to-end system
availability
Solution Components
● Google Cloud PubSub (Events Data Ingestion)
● Google Cloud Dataflow (Stream Processing)
● Google Bigquery (Analytics)
● Cross-Cloud Environment (AWS-GCP)
● AWS DynamoDB (Operational datastore)
Note: Although Cloud Datastore was our prefered operational DB, but its non availability in SG region
necessitated use of Dynamodb.
How did we..?
Analytics Architecture: Reimagined
Consumer of Data
Streaming
Batch
Traveloka
App
Kafka
ETL
Data
Warehouse
S3 Data
Lake
Batch
Ingest
Android,
iOS
DOMO
Analytics UI
NoSQL DB
Traveloka
Services
Ingest
Cloud
Pub/Sub
Storage
Cloud
Storage
Pipelines
Cloud
Dataflow
Analytics
BigQuery
Monitoring
Logging
Hive, Presto
Query
Developed two Common Dataflow Engine
● Self-Service Streaming analytics to BigQuery
Developed two Common Dataflow Engine
● Stream processing to DynamoDB, common features for dev:
○ Combine by key
○ Optimistic Concurrency
○ Local-file based integration test
Key Facts/Numbers
● End to End Pipeline Latency: seconds
● Volume: hundreds of GB/day
● Team: 2 Developers, 0 Ops
● Agility: POC + Pilot in 1 month
● Migrate 50+ different stream processing use case in 1 month
● Bigquery Integration with BI tools: thousands of dashboard, hundreds of
users
Awesome Autoscale
Pubsub & Dataflow could
absorb spiky load just fine!
Our case: promo
PubSub Publish Count DataFlow vcpus Count
Why Cloud Dataflow
(Beam): Tuning Pipeline
vs. Managing Servers
The Lambda Architecture
Unified Model with Apache Beam
Batch
or
Streaming
http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
Unified Model with Apache Beam
Optimize
Schedule
Why Google Cloud..?
Traveloka Data Team Philosophy
● Managed Service
● NoOps
● Self-Service
Focus more on solving complex business problems rather than focusing on
infrastructure
What required us to change?
● Ever increasing scale
● Ever increasing operations burden
● New business needs: Streaming Analytics
What’s Next..?
Next Generation Architecture
Cloud Pub/Sub
Cloud Dataflow
BigQuery Cloud Storage
Kubernetes Cluster Collector
Managed services
Simplify!
BI &
Analytics UI
Conclusion
Our engineering team of
2 produces and
maintains like a team of
8 because of products
like PubSub, Dataflow &
Bigquery
“
”
Q&A
Thank You.

Más contenido relacionado

La actualidad más candente

Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Tin Ho
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Meetup Google BigQuery powered by ai
Meetup Google BigQuery powered by aiMeetup Google BigQuery powered by ai
Meetup Google BigQuery powered by aiIdo Volff
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureOliver Buckley-Salmon
 
Challenges in Building a Data Pipeline
Challenges in Building a Data PipelineChallenges in Building a Data Pipeline
Challenges in Building a Data PipelineManish Kumar
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueDatabricks
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...Databricks
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Sparkdatamantra
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Jason Flittner
 
Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitZhenxiao Luo
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 

La actualidad más candente (20)

Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture Speed layer : Real time views in LAMBDA architecture
Speed layer : Real time views in LAMBDA architecture
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Meetup Google BigQuery powered by ai
Meetup Google BigQuery powered by aiMeetup Google BigQuery powered by ai
Meetup Google BigQuery powered by ai
 
Using Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architectureUsing Hazelcast in the Kappa architecture
Using Hazelcast in the Kappa architecture
 
Challenges in Building a Data Pipeline
Challenges in Building a Data PipelineChallenges in Building a Data Pipeline
Challenges in Building a Data Pipeline
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Uber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks SummitUber Geo spatial data platform at DataWorks Summit
Uber Geo spatial data platform at DataWorks Summit
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 

Similar a Traveloka's journey to no ops streaming analytics

Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govNot Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govChris Shenton
 
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...Chris Shenton
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSpark Summit
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar
 
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18Imre Nagi
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Laure Vergeron
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...Amazon Web Services
 
The great migration embracing serverless first
The great migration  embracing serverless first The great migration  embracing serverless first
The great migration embracing serverless first AngelaTimofte1
 
AWS Techniques and lessons writing a minimal cost gitlab runner
AWS Techniques and lessons writing a minimal cost gitlab runnerAWS Techniques and lessons writing a minimal cost gitlab runner
AWS Techniques and lessons writing a minimal cost gitlab runnerAnthony Scata
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...Codecamp Romania
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product Invotra
 

Similar a Traveloka's journey to no ops streaming analytics (20)

Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govNot Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
 
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan SharmaSparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
 
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
Openbar Kontich // Google Cloud: past, present and the (oh so sweet) future b...
 
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
How Netflix Monitors Applications in Near Real-time w Amazon Kinesis - ABD401...
 
The great migration embracing serverless first
The great migration  embracing serverless first The great migration  embracing serverless first
The great migration embracing serverless first
 
AWS Techniques and lessons writing a minimal cost gitlab runner
AWS Techniques and lessons writing a minimal cost gitlab runnerAWS Techniques and lessons writing a minimal cost gitlab runner
AWS Techniques and lessons writing a minimal cost gitlab runner
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
 
CloudKit
CloudKitCloudKit
CloudKit
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product How we leveraged Drupal to build a leading SaaS product
How we leveraged Drupal to build a leading SaaS product
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 

Último

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Último (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Traveloka's journey to no ops streaming analytics

  • 1. Traveloka’s Journey to No Ops Streaming Analytics Rendy Bambang Jr., Data Eng Lead - Traveloka Gaurav Anand, Solutions Engineer - Google
  • 2. ● Business Intelligence ● Analytics ● Personalization ● Fraud Detection ● Ads optimization ● Cross selling ● AB Test ● etc. How we use the data
  • 3. 6 offices Incl. Singapore 1,000+ Global employees 400+ Engineers Our technology core has enabled us to scale Traveloka into 6 countries across ASEAN rapidly in less than 2 years.
  • 6. Consumer of Data Initial Data Architecture Streaming Batch Traveloka App Kafka ETL In Memory Real Time DW Data Warehouse S3 Data Lake Batch Ingest Android, iOS NoSQL Realtime DB Traveloka Services Hive, Presto Query DOMO Analytics UI
  • 7. Key Numbers ● Volume kafka: billions of messages/day ● In-Memory DB: hundreds of GB in-memory data ● NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases ● S3: hundreds of TB ● Spark: 20+ nodes, 200+ core ● Redshift DW: 20+ Nodes, tens of TB ● Team: 8 Developers + 3 SysOps/DevOps
  • 8. Consumer of Data Problems with Initial Data Architecture Streaming Batch Traveloka App Kafka ETL In Memory Real Time DW Data Warehouse S3 Data Lake Batch Ingest Android, iOS NoSQL Realtime DB Traveloka Services Hive, Presto Query DOMO Analytics UI
  • 9. Problems with Initial Data Architecture Debugging Kafka Issues - Dedicated On-call Data Warehouse throughput issues for high frequency load, coupling storage & compute Team well being, paged on holiday, even honeymoon for infra issue! Scaling Issues with NoSQL DB and In-Memory DB Scaling Issues with Custom-built Java Consumers
  • 11. Ideal Solution Fully-managed infrastructure to free engineers to solve business problems Autoscaling of Storage and Compute Low end-to-end latency with guaranteed SLA Resilience, end-to-end system availability
  • 12. Solution Components ● Google Cloud PubSub (Events Data Ingestion) ● Google Cloud Dataflow (Stream Processing) ● Google Bigquery (Analytics) ● Cross-Cloud Environment (AWS-GCP) ● AWS DynamoDB (Operational datastore) Note: Although Cloud Datastore was our prefered operational DB, but its non availability in SG region necessitated use of Dynamodb.
  • 14. Analytics Architecture: Reimagined Consumer of Data Streaming Batch Traveloka App Kafka ETL Data Warehouse S3 Data Lake Batch Ingest Android, iOS DOMO Analytics UI NoSQL DB Traveloka Services Ingest Cloud Pub/Sub Storage Cloud Storage Pipelines Cloud Dataflow Analytics BigQuery Monitoring Logging Hive, Presto Query
  • 15. Developed two Common Dataflow Engine ● Self-Service Streaming analytics to BigQuery
  • 16. Developed two Common Dataflow Engine ● Stream processing to DynamoDB, common features for dev: ○ Combine by key ○ Optimistic Concurrency ○ Local-file based integration test
  • 17. Key Facts/Numbers ● End to End Pipeline Latency: seconds ● Volume: hundreds of GB/day ● Team: 2 Developers, 0 Ops ● Agility: POC + Pilot in 1 month ● Migrate 50+ different stream processing use case in 1 month ● Bigquery Integration with BI tools: thousands of dashboard, hundreds of users
  • 18. Awesome Autoscale Pubsub & Dataflow could absorb spiky load just fine! Our case: promo PubSub Publish Count DataFlow vcpus Count
  • 19. Why Cloud Dataflow (Beam): Tuning Pipeline vs. Managing Servers
  • 21. Unified Model with Apache Beam Batch or Streaming
  • 23. Unified Model with Apache Beam
  • 26. Traveloka Data Team Philosophy ● Managed Service ● NoOps ● Self-Service Focus more on solving complex business problems rather than focusing on infrastructure
  • 27. What required us to change? ● Ever increasing scale ● Ever increasing operations burden ● New business needs: Streaming Analytics
  • 29. Next Generation Architecture Cloud Pub/Sub Cloud Dataflow BigQuery Cloud Storage Kubernetes Cluster Collector Managed services Simplify! BI & Analytics UI
  • 30. Conclusion Our engineering team of 2 produces and maintains like a team of 8 because of products like PubSub, Dataflow & Bigquery “ ”
  • 31.
  • 32. Q&A

Notas del editor

  1. One of the most strategic parts of Traveloka's business is a streaming data processing pipeline that powers a number of use cases, including fraud detection, personalization, ads optimization, cross selling, A/B testing, and promotion eligibility. In this talk, we’ll describe how Traveloka recently migrated this pipeline from a legacy architecture to a multi-cloud solution that includes the Google Cloud Platform (GCP) data analytics platform.
  2. Highlight data engineer, highlight singapore office Traveloka is a travel technology company based in Indonesia, Singapore, and India its goal is to revolutionize human mobility
  3. Traveloka vision
  4. The purpose of my talk today is to give you a practical feedback regarding PubSub, DataFlow, BigQuery and more broadly our usage of Google Cloud Platform. I'll start by briefly talking about Traveloka and what we do. Then I'll discuss the architecture we used for the past few years and the reasons why we decided to investigate new solutions. Finally, I'll present you what we put in place and the lessons learned along the way.
  5. Volume kafka: billions of messages/day In-Memory DB: hundreds of GB in-memory data NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases S3: hundreds of TB Spark: 20+ nodes, 200+ core Redshift DW: 20+ Nodes, tens of TB Team: 8 Developers + 3 SysOps/DevOps
  6. Volume kafka: billions of messages/day In-Memory DB: hundreds of GB in-memory data NoSQL DB: 50+ nodes, 20+ TB storage, 50+ use cases S3: hundreds of TB Spark: 20+ nodes, 200+ core Redshift DW: 20+ Nodes, tens of TB Team: 8 Developers + 3 SysOps/DevOps
  7. Track Session: As Traveloka grew over time, several problems emerged, including:
  8. Track Session: We did our homework on technology that could support these requirements for our use case.
  9. 4minutes Highlight component Role and mapping similar function from both sides, like bigquery Hybrid, not the end state
  10. Track Session: We did our homework on technology that could support these requirements for our use case.
  11. Track Session: We did our homework on technology that could support these requirements for our use case.
  12. Touch team, 0 ops Agility migration Bigquery integration
  13. How big is the spike? 10x?
  14. 20th Minute slide.
  15. Basic idea: Run low-latency, weakly consistent streaming system Alongside high-latency, strongly consistent batch system And somehow merge their results together at the end. This provided low-latency, correct results. But at the cost of building, maintaining, and merging the results from two separate systems. So what we set out to do with Apache Beam, was to provide a unified model...
  16. One which could give you the features of both systems… But even more than that, one which would allow you to tradeoff the characteristics of each according to use case So after you write your pipeline,
  17. This is an approach we laid out in our 2013 VLDB paper on the Dataflow Model. And if you want to learn more in detail, that’s a good place to start.
  18. ...whether that’s Dataflow, Apache Spark, Apache Flink, Apache Apex, or any other runner we support. And then for our part...
  19. Dynamic Load balancing & Autoscaling Worker VMs, Optimize Pipelines,
  20. Remove AWS Rectangle, we will just add for BI Analytics
  21. Thank you for them!