An overview of modern scalable web development

An Overview of Modern
Scalable Web Development
Septeni Technology
tung_nt
&
Let’s take a tour of morden tech trends

Agenda
• Motivation and Challenges
• The evolution of Software architecture
• Big Data
• AI - Machine Learning
• Cloud Computing
• Septeni Techstack

Motivation & Challenges
• 90 percent of the data in the world today has been created in the
last two years alone, creating 2.5 quintillion(10^18) bytes of data
every day (*)
• Faster & Concurrency (Realtime or Near Realtime)
• Resilient (~100% uptime)
• Large-scale
★ According to a report from IBM Marketing Cloud (2016)

Reactive System design
principles
• Responsive, even in the face of failure
• Elastic, responsive under load
• Resilient, expect failure, programmatic and systemic
• Message-driven, the only way to communicate asynchronously in a
distributed environment

Problems with monolithic
architecture
• Pros:
✓ Simple to develop
✓ Simple to test
✓ Simple to deploy
• Cons:
- Hard to scale (too large and
complex)
- Leading to “Big ball of Mud”
- Is a barrier to adopting new
technologies

The evolution of software
architecture
• Scale-up vs Scale-out (or Vertical Scale vs Horizontal scale)
• MVC Monolith Distributed Services Oriented (SOA, Microservices)

Services-Oriented Architecture
• Pros:
✓ Tackling Complexity in Large-Scale
Systems
✓ Easy to scale-out (Scalability)
✓ Distributed & Containers friendly
✓ Develop, test, deploy independently
✓ …
• Cons:
- System testing is much more
complex.
- Not suitable with small application

The Traditional Microservices Architecture
Components:
• Load balancer
• API Gateway
• Service Discovery
• Independent self-container
services with comunication
endpoint (RestAPI,
Messaging)
• …

Concepts
• Data Warehouse & Data Mart
• OLTP vs OLAP
• HDFS, MapReduce
• Big Data architecture
✓ Batch processing
✓ Real-time processing

Data Warehouse
• Is a database that is designed for query and analysis data
• Characteristics:
‣ Subject oriented
‣ Integrated
‣ Time Variant
‣ Non-volatile
‣ Separated from Operational Databases
• Schema:
‣ Star
‣ Snowﬂake
‣ Galaxy

Data Mart
• The data mart is a subset of the data warehouse
• Is usually oriented to a speciﬁc business line or team
• Improve end-user response time
• Types:
1. Dependent: created from an existing data warehouse.
2. Independent: Data is extracted from internal or external data
sources (or both).
3. Hybrid: combines data from an existing data warehouse and
other operational source system

Why Data Warehouse?
• Make better business decisions:
• Develop data-driven strategies
• Make decisions consulting the facts
• Quick access to organization's
historical activities:
• Evaluate initiatives that have been
successful — or unsuccessful — in
the past

OLAP vs OLTP
OLAP - Online analytical processing:
• Data Warehouse
• Historical processing
• Used to analyze the business.
• Schemas: Star, Snowﬂake, Galaxy
• Contains historical data
• Highly ﬂexible
OLTP - Online transactional processing:
• Operational Database
• Day-to-day processing
• Used to run the business
• Schemas: Entity Relationship Model
• Contains current data
• High performance

Building a Data Warehouse
(aka Data Warehousing)
Some steps that are needed for building a data warehouse are as
following below:
1. Extract the data from different data sources.
2. Transform the data.
3. Load the data into the dimensional database.
Extract - Transform - Load (ETL) Task

Problems with traditional data warehousing
• Only handles structure data (relational or not relational)
• Processing is based on schema-on-write concepts
• Top-down approach (extract data by requirements)
• Suitable for data with small volume and it’s too much expensive for
large volume data

BigData Characteristics
➡ Volume
➡ Variety
➡ Velocity
➡ Veracity

What is HDFS and MapReduce?
• Hadoop Distributed File System (HDFS):
Is the ﬁle system used by Hadoop to store data among different
clusters of machine
• MapReduce:
Is a processing technique and a program model for distributed
computing

Why Hadoop and Data Lake?
• Dealing with semi-structured (JSON, XML, Avro) and unstructured
data (plaintext)
• Schema-on-Read
• Using analytics engine (Hadoop)
• Bottom-up approach
• Data hoarding
✓ all data has potential value
• Dealing with large volume data

Big Data Architecture
• Lambda Architecture
➡ 3 Layers: Batch, Speed, Serving
• Kappa Architecture
➡ 2 Layers: Streaming, Serving

Data warehouse + Data Lake
= Better together
• Data warehouse
➡ What happened?
➡ Why did it happen?
• Data lake
➡ What will happen?

An example of a real-life ML system
Flow:
1. Manage data
2. Train models
3. Evaluate models
4. Deploy models
5. Make predictions
6. Monitor predictions
Uber Michelangelo - ML End to End Platform

Roles - Skill in a ML project
• Software Engineer:
✓ Build system to collect data, avoid
bottlenecks and let ML algorithms
scale well with increasing volumes of
data
✓ Deploy & Integrate ML model to system
• Applied ML Engineer:
✓ Strong knowledge about ML framework
(Tensorﬂow, scikit-learn, PyTorch,
Caffe…) and ML algorithms to tuning
hyper-parameter and train new model
• Core ML Engineer:
✓ Modeling, visualize and evaluate data
and monitor them
• Data scientist:
✓ Analyzing data in order to tell a story

Cloud Computing Type
• Infrastructure as a Service (IaaS):
• Virtualized hardware resource as a service
• Platform as a Service (PaaS):
• Virtualized OS, runtime, middleware, etc as a service
• Software as a Service (SaaS):

What’re the differences between them
and on-premises?
On-premises vs Cloud

Why Cloud Computing?
• Easy to scale
• Reliability
• Cost on-demand
• Securities
• Focus to application
Cloud computing economies of scale.

Most popular cloud provider
• Amazon Web Services (AWS)
• Google Cloud Platform (GCP)
• Microsoft Azure
• IBM Cloud
• Oracle Cloud
• …

Server Side
• Scala, Java, Python, NodeJs, PHP
• Play Framework, Akka, Redis, Memcached, Nginx, Apache,
MySQL, PostgreSQL, Kafka, Cassandra,…

Client Side
• Web:
AngularJS, VueJs, ReactJs,…
• Game - Mobile:
Object C, Swift, Java,…

Datawarehouse & Data processing
framework
• Treasure Data, Tableau, Embulk, Fluentd, Spark Streaming,
Hadoop, Google BigQuery, ElasticSearch, Amazon S3, Amazon
RDS

Infrastructure
• Amazon Web Service, Google Cloud Platform
• Docker, Kubernetes, Ansible

Development Tools
• Gitlab, Gitlab CI, Jira, Conﬂuence, IntelliJ IDEA
• Slack, Google Suite

References
Our Website: http://septeni-technology.jp/
Engineer Blog: http://labs.septeni-technology.jp/

An overview of modern scalable web development

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a An overview of modern scalable web development

Similar a An overview of modern scalable web development (20)

Más de Tung Nguyen

Más de Tung Nguyen (6)

Último

Último (20)

An overview of modern scalable web development