Serverless Big Data Architecture on Google Cloud Platform was presented by Kriangkrai Chaonithi. The presentation covered Credit OK's use of serverless architecture on GCP for their big data analytics platform. Credit OK processes large amounts of customer data from over 400 sites to perform credit scoring. They use Google Cloud Functions to ingest data from sites, as well as Compute Engine and Google Cloud Storage. This serverless architecture allows them to automatically scale infrastructure as needed, reducing costs since they only pay for resources used. While serverless architectures don't require managing servers, there are still resource limits that must be considered to avoid issues like exhausted worker pools during peak loads.
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
1. Serverless Big Data Architecture
on
Google Cloud Platform
at
Presented by Kriangkrai Chaonithi @spicydog
On 25/11/2018, At Barcamp Bangkhen 9
2. Hello! My name is Gap
Education
● BS Applied Computer Science (KMUTT)
● MS Applied Computer Engineering (KMUTT)
Work Experience
● Former Android, iOS & PHP Developer at Longdo.COM
● Former R&D Manager at Insightera
● CTO & co-founder at Credit OK
Fields of Interests
● Software Engineering
● Computer Security
● Servers & Cloud & Distributed Computing
● Machine Learning & NLP
https://spicydog.me
3. Agenda
● Server & application deployment history
● Introduction to Google Cloud Platform products
○ Computing
○ Storage & databases
○ Data analytics
● Big data architecture at Credit OK
○ About Credit OK
○ Why we use serverless
○ Our requirements
○ Our solutions
○ The summary
5. Bare Metal Server
● Pre-cloud era (probably..)
● Install OS and dependencies on a machine
● One machine - one server
● Expose the network to the internet
● Colocation/on-premise
● SSH/FTP/Git to the server
6. Virtualization
● One machine - many servers
● One machine multiple customers
● VPS / Cloud
● SSH/FTP/Git to the server
IaaS
7. Containers & Micro Services
● Docker / Kubernetes
● Auto deployment
● Auto scale (automatic spawn new nodes)
● Pay base on number of nodes
● Infrastructure as code! (new concept!)
PaaS
22. Why use serverless on big data?
● Scalable & super high performance
● No more server maintenance :)
● Easier to optimize
● Only pay per use
23. Requirements
● Have a HUGE data warehouse for batch processing
● Our customer have on-premise data on >400 sites
● Data ingestor app is needed to install to every site
● Data ingestor app must be able to run on
● Data ingestor app must be super robust and easy to install
● Must work automatically everyday, task scheduler
24. When >400 sites upload large files
to your server at the same time..
This is unintentional DDoS!
25. So we mainly use cloud function
● Auto scale
● But only accept <10 MB body size
and also use
Compute/App Engine
for >10MB files
27. Serverless
Big Data Architecture
In Summary
● Focus on design & coding
● Few people to achieve huge task
● No cost on idle server, pay as you use
(GCS storage ~$0.02 per GB)
● Processing cost is surprisingly low when optimized
(Beware of BigQuery cost!)
28. Beware of ZONE_RESOURCE_POOL_EXHAUSTED
● Serverless doesn’t mean no server, you just do not need to spawn servers/workers
● Worker pools have limit, do not run your app at the peak time (but when!!)
● Hopefully Google will solve the problem soon :)
29. We Are Hiring!
● PHP Laravel/Lumen Developer
● Data Engineer
● Credit Risk Analyst
hr@creditok.co
https://jobs.blognone.com/company/creditok