SlideShare una empresa de Scribd logo
1 de 29
Coupang Confidential and Proprietary
이 문서는 쿠팡의 대외비이며 지적자산입니다
Journey to the Continuous and
Scalable Big Data Platform
Matthew (정재화), Coupang
Coupang Confidential and Proprietary
About me
02
• Software Development Manager of BigData & DW Platform team
• 8+ years Hadoop experience
• Apache Tajo Committer and PMC
• blrunner78@gmail.com
• Blog : https://blrunner.tistory.com
• The author of Hadoop tech hand book
Coupang Confidential and Proprietary
Agenda
03
1. On-Premise
2. Cloud 1.0
3. Cloud 2.0
4. Airflow as a Service
5. Zeppelin as a Service
Coupang Confidential and Proprietary
Motivation
04
The purpose of a business is to
create and keep a customer
- Peter Drucker -
Coupang Confidential and Proprietary
1. On-Premise
Coupang Confidential and Proprietary
Architecture
06
• Aggregations and Joins
• MapReduce
• Hive/Pig/Spark
• Oozie
Logs
• Client Logs
• Server Logs
• Adhoc Query
• HiveRDBMS
External Data
ETL Cluster Read-Only Cluster
Coupang Confidential and Proprietary
Team's Responsibility
07
• Architect, build and operate our data infrastructure and tools
• Create and maintain company-wide data pipeline
• Troubleshoot and resolve all issues as users arise
Coupang Confidential and Proprietary
Areas of Improvements
08
• Pros
• A wide variety of workloads
• Continuous increase in users
• Cons
• Multiple copies of Data
• Lack of Elasticity
• Operation overhead
Coupang Confidential and Proprietary
2. Cloud 1.0
Coupang Confidential and Proprietary
Architecture : Decouple compute and storage
010
Domain Cluster #N
Domain Cluster #2
Centralized Resources
Hive
Meta store
Cloud Storage
Batch Cluster
HiveServer2
Ad-hoc Cluster
HiveServer2
Domain Cluster #1
HiveServer2
- Batch Jobs
- High throughput
- fault tolerant, ETL
- Ad-hoc Queries
- Low latency
- Interactive Analysis
- In-memory
Coupang Confidential and Proprietary
Team's Responsibility
011
• Architect, build and operate our data infrastructure and tools
• Troubleshoot and resolve all issues as users arise
• Implement company-wide data pipelines
Coupang Confidential and Proprietary
Areas of Improvements
012
• Pros
• Allows Parsing, Enriching of Data for Custom Need
• Independent scale of CPU and storage capacity
• Cons
• Learning Curve for Cloud Infrastructure
• Operation overhead
• Users want latest tools and more features
Coupang Confidential and Proprietary
3. Cloud 2.0
Coupang Confidential and Proprietary
High Level Architecture
014
Storage
Data Processing Tools
Scheduler Tools
Security
Airflow
LDAP Authentication Apache Ranger ACL & Audit
Zeppelin
Monitoring
Computing Clusters
Cloud Storage
Data Platorm
Portal
Coupang Confidential and Proprietary
Various types of Computing Clusters
015
Centralized Resource
Hive
Meta Store
Cloud
Storage
Transient Cluster
- Batch Jobs
Persistent Cluster
- Interactive Queries
Workload Specific Cluster
Coupang Confidential and Proprietary
Team's Responsibility
016
• Architect, build and our data infrastructure and tools
• Create data APIs and data services
• Support users using SLA policies
• Maintaining security and data privacy
• Application Knowledge Support Artifacts, etc.
Coupang Confidential and Proprietary
Areas of Improvements
017
• Pros
• Onboard lots of users and variety of jobs
• Easier management and added features
• Cons
• Unintended infrastructure costs have increased
• A wide variety of client tools and Dev environments
• Various types of users
Coupang Confidential and Proprietary
Lessons & Learnings
018
• Distribute traffic instead of concentrating the one place
• Optimize all types of system resources in clusters
• Enforce the Lifecycle of Hadoop Cluster
• Monitor clusters and send alarms from the efficiency perspective
• Training Users Continuously and building the community culture
Coupang Confidential and Proprietary
4. Airflow as a Service
Coupang Confidential and Proprietary
Why we love Airflow?
020
• Define Workflows as code
• Makes Workflows more maintainable, versionable, and testable
• More flexible execution and workflow generation
• Lots of features
• Sensor
• Workflow Profiling
• SLA alert
• Rich Web Interface
• Scalable Worker Processes
• In-house Airflow
Coupang Confidential and Proprietary
Airflow : deployment process
021
Cloud Storage
Coupang Confidential and Proprietary
5. Zeppelin as a Service
Coupang Confidential and Proprietary
Why we love Zeppelin?
023
• Easy spark development in personal computer
• Customized Presto Interpreter
• Run presto query easily without complex JDBC configuration
• Export the heavy data file to local machine without exception
• Persistent Storage for Notebook
Coupang Confidential and Proprietary
Zeppelin Architecture
024
Coupang Confidential and Proprietary
Areas of Improvements
025
• Users
• Load all notebooks in the main page -> Too slow
• Big notebook can consume most resources -> Zeppelin Pending
• Platform team
• Spark interpreter doesn’t support YARN cluster mode
• Doesn’t support the life cycle for notebooks
• Difficult to upgrade and improve existing zeppelins gracefully
Coupang Confidential and Proprietary
Resolution
026
• Upgrade Zeppelin to 0.8.1
• Main Page Improvements
• Yarn Cluster Mode for Spark Interpreter
• Interpreter Lifecycle manager
• Interpreter Recovery
• Containerized Zeppelin on Kubernetes
Coupang Confidential and Proprietary
Summary
027
• Understand who is the immediate customer
• Focus on the truly important things
• Detect and solve problems immediately
• Leverage the identity of infrastructure
• Best Practice is not best for you
Coupang Confidential and Proprietary
SELECT question FROM you
https://boards.greenhouse.io/coupang/
Coupang Confidential and Proprietary
Thank you

Más contenido relacionado

La actualidad más candente

Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 

La actualidad más candente (20)

Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groupsUnbreakable Sharepoint 2016 With SQL Server 2016 availability groups
Unbreakable Sharepoint 2016 With SQL Server 2016 availability groups
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worlds
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
 
Oracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy featuresOracle WebLogic 12c New Multitenancy features
Oracle WebLogic 12c New Multitenancy features
 
2 Speed IT powered by Microsoft Azure and Minecraft
2 Speed IT powered by Microsoft Azure and Minecraft2 Speed IT powered by Microsoft Azure and Minecraft
2 Speed IT powered by Microsoft Azure and Minecraft
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Azure database services for PostgreSQL and MySQL
Azure database services for PostgreSQL and MySQLAzure database services for PostgreSQL and MySQL
Azure database services for PostgreSQL and MySQL
 
Overcoming 5 Common Docker Challenges: How We Do It at RightScale
Overcoming 5 Common Docker Challenges: How We Do It at RightScaleOvercoming 5 Common Docker Challenges: How We Do It at RightScale
Overcoming 5 Common Docker Challenges: How We Do It at RightScale
 
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
Implement a disaster recovery solution for your on-prem SQL with Azure? Easy!
 
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
Things Every Oracle DBA Needs to Know About the Hadoop Ecosystem 20170527
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on Docker
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
 
Cloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsCloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong Codeaholics
 
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
10 Strategies for Developing Reliable Jakarta EE & MicroProfile Applications ...
 
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
Key Design Considerations Private and Hybrid Clouds - RightScale Compute 2013
 
8 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 20188 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 2018
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
 
Upgrade your SQL Server like a Ninja
Upgrade your SQL Server like a NinjaUpgrade your SQL Server like a Ninja
Upgrade your SQL Server like a Ninja
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 

Similar a [Coupang] Journey to the Continuous and Scalable Big Data Platform : 지속적으로 확장 가능한 빅데이터 플랫폼을 향한 여정

Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile apps
Mugunth Kumar
 

Similar a [Coupang] Journey to the Continuous and Scalable Big Data Platform : 지속적으로 확장 가능한 빅데이터 플랫폼을 향한 여정 (20)

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
When small problems become big problems
When small problems become big problemsWhen small problems become big problems
When small problems become big problems
 
Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica Ten tools for ten big data areas 01 informatica
Ten tools for ten big data areas 01 informatica
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
Optimize with Open Source
Optimize with Open SourceOptimize with Open Source
Optimize with Open Source
 
Praxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloudPraxistaugliche notes strategien 4 cloud
Praxistaugliche notes strategien 4 cloud
 
Location-independent SharePoint
Location-independent SharePointLocation-independent SharePoint
Location-independent SharePoint
 
Designing your API Server for mobile apps
Designing your API Server for mobile appsDesigning your API Server for mobile apps
Designing your API Server for mobile apps
 
Oracle Database Cloud Service
Oracle Database Cloud ServiceOracle Database Cloud Service
Oracle Database Cloud Service
 
(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance(ATS3-PLAT08) Optimizing Protocol Performance
(ATS3-PLAT08) Optimizing Protocol Performance
 
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
Lessons Learned from Building Enterprise APIs (Gustaf Nyman)
 
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...
 
Apex ace update
Apex ace updateApex ace update
Apex ace update
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
ow.ppt
ow.pptow.ppt
ow.ppt
 
ow.ppt
ow.pptow.ppt
ow.ppt
 
Ow
OwOw
Ow
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

[Coupang] Journey to the Continuous and Scalable Big Data Platform : 지속적으로 확장 가능한 빅데이터 플랫폼을 향한 여정

  • 1. Coupang Confidential and Proprietary 이 문서는 쿠팡의 대외비이며 지적자산입니다 Journey to the Continuous and Scalable Big Data Platform Matthew (정재화), Coupang
  • 2. Coupang Confidential and Proprietary About me 02 • Software Development Manager of BigData & DW Platform team • 8+ years Hadoop experience • Apache Tajo Committer and PMC • blrunner78@gmail.com • Blog : https://blrunner.tistory.com • The author of Hadoop tech hand book
  • 3. Coupang Confidential and Proprietary Agenda 03 1. On-Premise 2. Cloud 1.0 3. Cloud 2.0 4. Airflow as a Service 5. Zeppelin as a Service
  • 4. Coupang Confidential and Proprietary Motivation 04 The purpose of a business is to create and keep a customer - Peter Drucker -
  • 5. Coupang Confidential and Proprietary 1. On-Premise
  • 6. Coupang Confidential and Proprietary Architecture 06 • Aggregations and Joins • MapReduce • Hive/Pig/Spark • Oozie Logs • Client Logs • Server Logs • Adhoc Query • HiveRDBMS External Data ETL Cluster Read-Only Cluster
  • 7. Coupang Confidential and Proprietary Team's Responsibility 07 • Architect, build and operate our data infrastructure and tools • Create and maintain company-wide data pipeline • Troubleshoot and resolve all issues as users arise
  • 8. Coupang Confidential and Proprietary Areas of Improvements 08 • Pros • A wide variety of workloads • Continuous increase in users • Cons • Multiple copies of Data • Lack of Elasticity • Operation overhead
  • 9. Coupang Confidential and Proprietary 2. Cloud 1.0
  • 10. Coupang Confidential and Proprietary Architecture : Decouple compute and storage 010 Domain Cluster #N Domain Cluster #2 Centralized Resources Hive Meta store Cloud Storage Batch Cluster HiveServer2 Ad-hoc Cluster HiveServer2 Domain Cluster #1 HiveServer2 - Batch Jobs - High throughput - fault tolerant, ETL - Ad-hoc Queries - Low latency - Interactive Analysis - In-memory
  • 11. Coupang Confidential and Proprietary Team's Responsibility 011 • Architect, build and operate our data infrastructure and tools • Troubleshoot and resolve all issues as users arise • Implement company-wide data pipelines
  • 12. Coupang Confidential and Proprietary Areas of Improvements 012 • Pros • Allows Parsing, Enriching of Data for Custom Need • Independent scale of CPU and storage capacity • Cons • Learning Curve for Cloud Infrastructure • Operation overhead • Users want latest tools and more features
  • 13. Coupang Confidential and Proprietary 3. Cloud 2.0
  • 14. Coupang Confidential and Proprietary High Level Architecture 014 Storage Data Processing Tools Scheduler Tools Security Airflow LDAP Authentication Apache Ranger ACL & Audit Zeppelin Monitoring Computing Clusters Cloud Storage Data Platorm Portal
  • 15. Coupang Confidential and Proprietary Various types of Computing Clusters 015 Centralized Resource Hive Meta Store Cloud Storage Transient Cluster - Batch Jobs Persistent Cluster - Interactive Queries Workload Specific Cluster
  • 16. Coupang Confidential and Proprietary Team's Responsibility 016 • Architect, build and our data infrastructure and tools • Create data APIs and data services • Support users using SLA policies • Maintaining security and data privacy • Application Knowledge Support Artifacts, etc.
  • 17. Coupang Confidential and Proprietary Areas of Improvements 017 • Pros • Onboard lots of users and variety of jobs • Easier management and added features • Cons • Unintended infrastructure costs have increased • A wide variety of client tools and Dev environments • Various types of users
  • 18. Coupang Confidential and Proprietary Lessons & Learnings 018 • Distribute traffic instead of concentrating the one place • Optimize all types of system resources in clusters • Enforce the Lifecycle of Hadoop Cluster • Monitor clusters and send alarms from the efficiency perspective • Training Users Continuously and building the community culture
  • 19. Coupang Confidential and Proprietary 4. Airflow as a Service
  • 20. Coupang Confidential and Proprietary Why we love Airflow? 020 • Define Workflows as code • Makes Workflows more maintainable, versionable, and testable • More flexible execution and workflow generation • Lots of features • Sensor • Workflow Profiling • SLA alert • Rich Web Interface • Scalable Worker Processes • In-house Airflow
  • 21. Coupang Confidential and Proprietary Airflow : deployment process 021 Cloud Storage
  • 22. Coupang Confidential and Proprietary 5. Zeppelin as a Service
  • 23. Coupang Confidential and Proprietary Why we love Zeppelin? 023 • Easy spark development in personal computer • Customized Presto Interpreter • Run presto query easily without complex JDBC configuration • Export the heavy data file to local machine without exception • Persistent Storage for Notebook
  • 24. Coupang Confidential and Proprietary Zeppelin Architecture 024
  • 25. Coupang Confidential and Proprietary Areas of Improvements 025 • Users • Load all notebooks in the main page -> Too slow • Big notebook can consume most resources -> Zeppelin Pending • Platform team • Spark interpreter doesn’t support YARN cluster mode • Doesn’t support the life cycle for notebooks • Difficult to upgrade and improve existing zeppelins gracefully
  • 26. Coupang Confidential and Proprietary Resolution 026 • Upgrade Zeppelin to 0.8.1 • Main Page Improvements • Yarn Cluster Mode for Spark Interpreter • Interpreter Lifecycle manager • Interpreter Recovery • Containerized Zeppelin on Kubernetes
  • 27. Coupang Confidential and Proprietary Summary 027 • Understand who is the immediate customer • Focus on the truly important things • Detect and solve problems immediately • Leverage the identity of infrastructure • Best Practice is not best for you
  • 28. Coupang Confidential and Proprietary SELECT question FROM you https://boards.greenhouse.io/coupang/
  • 29. Coupang Confidential and Proprietary Thank you