Serverless Big Data Architecture on Google Cloud Platform at Credit OK

Serverless Big Data Architecture
on
Google Cloud Platform
at
Presented by Kriangkrai Chaonithi @spicydog
On 25/11/2018, At Barcamp Bangkhen 9

Hello! My name is Gap
Education
● BS Applied Computer Science (KMUTT)
● MS Applied Computer Engineering (KMUTT)
Work Experience
● Former Android, iOS & PHP Developer at Longdo.COM
● Former R&D Manager at Insightera
● CTO & co-founder at Credit OK
Fields of Interests
● Software Engineering
● Computer Security
● Servers & Cloud & Distributed Computing
● Machine Learning & NLP
https://spicydog.me

Agenda
● Server & application deployment history
● Introduction to Google Cloud Platform products
○ Computing
○ Storage & databases
○ Data analytics
● Big data architecture at Credit OK
○ About Credit OK
○ Why we use serverless
○ Our requirements
○ Our solutions
○ The summary

Server & Application
Deployment History

Bare Metal Server
● Pre-cloud era (probably..)
● Install OS and dependencies on a machine
● One machine - one server
● Expose the network to the internet
● Colocation/on-premise
● SSH/FTP/Git to the server

Virtualization
● One machine - many servers
● One machine multiple customers
● VPS / Cloud
● SSH/FTP/Git to the server
IaaS

Containers & Micro Services
● Docker / Kubernetes
● Auto deployment
● Auto scale (automatic spawn new nodes)
● Pay base on number of nodes
● Infrastructure as code! (new concept!)
PaaS

Why Container Orchestration?
https://blog.risingstack.com/what-is-kubernetes-how-to-get-started/

Serverless
● Write code and deploy!
● Auto deploy
● Auto scale
● Pay per request
● No infrastructure!!
SaaS

GCP Computing
Virtual Machine
Containers
Severless

Let’s Review Types of Databases
SQL NoSQL

GCP Storages & Databases
Non-serverless
Serverless

GCP Data Analytics
Pipeline Analytics Visualization

Credit Scoring Platform on Big Data Analytics
creditok.co

Why use serverless on big data?
● Scalable & super high performance
● No more server maintenance :)
● Easier to optimize
● Only pay per use

Requirements
● Have a HUGE data warehouse for batch processing
● Our customer have on-premise data on >400 sites
● Data ingestor app is needed to install to every site
● Data ingestor app must be able to run on
● Data ingestor app must be super robust and easy to install
● Must work automatically everyday, task scheduler

When >400 sites upload large files
to your server at the same time..
This is unintentional DDoS!

So we mainly use cloud function
● Auto scale
● But only accept <10 MB body size
and also use
Compute/App Engine
for >10MB files

Raw Data
Source
Raw Data
Source
Data Flow Architecture

Serverless
Big Data Architecture
In Summary
● Focus on design & coding
● Few people to achieve huge task
● No cost on idle server, pay as you use
(GCS storage ~$0.02 per GB)
● Processing cost is surprisingly low when optimized
(Beware of BigQuery cost!)

Beware of ZONE_RESOURCE_POOL_EXHAUSTED
● Serverless doesn’t mean no server, you just do not need to spawn servers/workers
● Worker pools have limit, do not run your app at the peak time (but when!!)
● Hopefully Google will solve the problem soon :)

We Are Hiring!
● PHP Laravel/Lumen Developer
● Data Engineer
● Credit Risk Analyst
hr@creditok.co
https://jobs.blognone.com/company/creditok

Time is short, let’s utilize the networks.
Feel free to connect with me via spicydog.me

Serverless Big Data Architecture on Google Cloud Platform at Credit OK

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Serverless Big Data Architecture on Google Cloud Platform at Credit OK

Similar a Serverless Big Data Architecture on Google Cloud Platform at Credit OK (20)

Último

Último (20)

Serverless Big Data Architecture on Google Cloud Platform at Credit OK