6. 6
When to choose Microservices?
Benefits
- Independent deployment
- Fault isolation
- Diverse technology
- Small focused team
- Separate scalability
Challenges
- Complexity
- Network congestion
- Data integrity/consistency
- Testing
- Reliability
Business domain
- Complex domain
- Frequent update
- Many independent teams
Prerequisites
- Skill set for distributed system
- Domain knowledge
- DevOps culture
- Monitoring capability
7. 7
Choosing architecture styles
Dependency management Domain type/complexity
N-Tier Horizontal layers (open/close) Majority of business logic is CRUD
Web-Queue-Worker Front/Backend jobs
Decoupled by async messaging
Relatively simple domain with some resource
intensive tasks
Microservices Vertical (functional) decoupling
Service calls via API
Complicated domain logic that requires each
service to encapsulate domain knowledge
CQRS R/W segregation
Schema/Scale are optimized separately
Collaborative domain where lots of users access
the same data
EDA(IoT) Data ingested into streaming
Independent view per sub-system
Internet of things
Big data Divide huge dataset into small chunks
Parallel processing on local dataset
Batch and real-time data analysis
Predictive analysis using ML
Big compute Data allocation to thousands of cores Compute intensive domain such as simulation,
number crunching
12. 12
Failover / Failback
Traffic manager
Priority routing method
Web
Application
Data
Web
Application
Data
Automatedfailover
Manualfailback
Primary region
Secondary region (regional pair)
WebWebWeb
Data
ApplicationApplication
Data
Health endpoint monitoring
13. 13
Designing for resiliency
Reading data from SQL Server fails
A web server goes down
A NVA goes down
1. Identify possible failures
2. Rate risk of each failure
(impact x likelihood)
3. Design resiliency strategy
- Detection
- Recovery
- Diagnostics
‘Azure resiliency guidance’
19. 20
Choosing right storage
• Evaluate managed services first, then OSS equivalents
• Portability, cost, scalability, version are common show stoppers
• Test performance with production load
• Throughput/Latency heavily depends on query type, payload size etc.
• Choose best storage for the job
• DB, Cache, Search Index, Streaming, Batch, Log, Archive
• Other factors
• Data type
• Query functions
• Consistency model
• Cost
‘Azure data store comparison’
23. 24
Inter service communication
Svc
A
Svc
B
Svc
D
Svc
E
GW
North – South
East–West
Challenges
- Endpoint proliferation (Routing)
- East – West chattiness (LB)
- Resiliency (Retry, FI)
- Versioning (SxS, B/G)
- Monitoring (Distributed tracing)
- Security (Encryption, Authentication)
Requests
25. 26
API gateway Primer
• Routing
• Aggregation
• Offloading
Svc
A
Svc
B
Svc
C
Svc
D
Svc
E
GW
Contoso.com/api/serviceA
Logging
Caching
Retry
Circuit breaker
Throttling
SSL termination
Authentication
Contoso.com/api/GetRecommendation?userid=N
26. 27
Process of designing microservices using DDD
Accounts
Drone management
3rd party
transportation
Call center
Video
surveillance
Drone
sharing
Drone
management
Drone sharing
3rd party
transportation
Shipping (Core)
Call center
Shipping
Surveillance
Accounts
Bounded context
Aggregate Aggregate
Aggregate Domain Service
Domain ServiceAggregate
Application Service
Aggregate
Event
Domain model Domain building blocks
Service mapping
Shipping
Drone Package
Delivery DeliveryScheduler
DeliverySupervisor
Account
3rd party
transportation
Authentication
Service in BC
Service in BC
Service
In BC
Service
In BC
Service
outside
Service
outside
Service
outside
Service
outside
Further refinement
Breakdown per BC
Service interaction design
DeliveryScheduler
Package
Drone
Delivery
Mobile
app
Query
Delivery
History
DeliveryEvents
RequestEvents
GW
Status
3rd party
Service
Account
Service
DroneMgmt
Service
AAD
Account
Service
Auth
Service
3rd party
transportation
Account
RequestHandler
Delivery
Analysis
Archive
Supervisor
Failed ops
Drone events
29. 30
Big data – service mapping
Data
source
Batch
processing
Stream
analysis
Analytics
Store
Data streaming
Business
intelligence
Orchestration
Data storage
Weblogs
Click stream
Sensors
IoT Hub
Event Hub
Kafka
ASA
Spark
Storm
Functions
ADLS
Blob
CosmosDB
HBase
ADLA
HDI
Custom
HBase
SQL DB
CosmosDB
Power BI
Notebooks
Jupyter
Zeppelin
Custom
Data factory
Oozie
SSIS
Choosing archs is not straight forward
You have to consider many factors
Windows-DNA, COM/COM+
RIA, Silverlight
SOA, Web service
Cloud, Azure
Microservices, Containers/SF
How can we make decisions?
We should keep these 4 dimensions in mind
- Goals
Prerequisite (You must be this tall to use XXX) If you don’t have enough skillset, don’t choose it
Does Benefit justify taking challenges?
Purist vs. pragmatist. I’d rather be a pragmatist meaning you have to adjust the degree of conformity to the reality
Messaging, concurrency control, eventual consistency
DevOps culture: CI/CD, Automation, Self provisioning/management
Monitoring (Correlation) is critical for RCA
Each service gets simplified but complexity is moving to integration part which is networking among services
How can you do E2E/integration testing?
More service means more surface area to fail.
Is this the goal you’re aiming for?
Do you meet the prerequisites?
Does benefit justify taking these challenges?
Many services means many point of failure.
Figure if MSA is the right choice depending on these four dimensions
Decompose an app into 3 tiers, web, biz, data
Create AS or VMSS for each tier for high availability
Create separate subnet for each tier.
Use NSG to restrict NW traffic
Jumpbox allows RDP from a particular client (admin)
Redundant DB such as SQL AlwaysOn AG for HA
NVA can become SPOF
NVA should be deployed with LB w/ HA ports (preview)
Deploy N-Tier arch to more than one DC for HA with ATM
Unfortunately things are not that simple
There’s a risk of data loss in FO, take a snapshot and ensure the data integrity.
SQL DB auto-failover group supports automated FO.
In order to design your app to be resilient, you need to identify all possible failures first.
Then implement resilient strategies against them,
All apps associated with a plan run on the same VM instances.
Use Webjobs for background tasks
Use async queue between front and backend (by default)
Relational data into SQL DB, non-relational into Cosmos is the primary choice
Use Azure search for search index.
Use CDN for static content such as css, script, images and static HTML
Use different storage account for logs
Resource group is a boundary for management, billing and security
Put services w/ same lifecycle into the same RG
This is how it works in App Service. You can have up to 15 deployment slots
All apps associated with a plan run on the same VM instances.
Use Webjobs for background tasks
Deploy current and new version into two identical environments (blue, green)
Do smoke test on new version then switch traffic to it.
Canary release is to incrementally switches from current to new using LB.
Use Akamai or equivalent to do Canary.
The unique name for this environment comes from a tactic used by coal miners: they’d bring canaries with them into the coal mines to monitor the levels of carbon monoxide in the air; if the canary died, they knew that the level of toxic gas in the air was high, and they’d leave the mines.
In either case you should be able to rollback if the new version doesn’t work
Graceful shutdown and Switching DB/Storage are the challenge.
Github route request to blue and green, compares the result from blue and green. Make sure they are identical.
Dark launch: Deploy new features without enabling it to users. Make sure it won’t cause any issues in production, then enable it.
KV is hash table, use a unique key to store values
Document enables indexing any field in the doc
Colum-family divides columns into groups known as colum-family, optimzied for high throughput
Graph has node and edge to represent relationship between entities
Search is optimized for indexing large volume of da
Time series is optimized for the data that organized by time like telemetry
Data lake has store and processing together
Object store is optimized for large blob like images, files
Shared file supports SMB interface. It’s used for migration scenario
Setting up elasticsearch is not a trivial task. Doing tutorials is different from setting up production cluster for scale.
Using managed services could save you weeks of months of your time
Be careful from this point of view
SQL DB has concept of DTU. P15 has 4000DTU, how much throughput you get? It depends of query/payload
Data type: relational? schema flexibility?
Query: Aggregation (Group-By etc.), Index, Full-text search
Cost, scalability, sampling resolution are key criteria for monitoring
Multi-repo with each service deployed independently to the production cluster
20% of users are deploying multiple times a day. 40% multiple times a week. 50% implemented CI/CD
This slide has list of challenges rather than practices. I want to emphasize how important the networking is in MSA.
If there’re 100s of service each exposing endpoint, it’s hard to discover, load balance, protect, etc.
If you rotate this picture 90 degrees clock wise, it’ll be clear.
Especially, N-S requests becomes lots of E-W calls. We’ll see lots of East – West chattiness
Serialization-deserialization becomes performance overhead. Protobuf, Avro, Json, etc.
Centralized LB vs. decentralized LB (Service Fabric), Central one has better knowledge about state, decentralized one is handling distribution.
100s of services have different lifecycle, the destination of the service call may not be up and running. That’s why using message broker makes lots of sense because it keeps requests as messages while the destination is down. Then they’ll be processed afterwards.
When you update API, make sure it’s backward compatible. Or you can have 2 versions running SxS and gradually migrate from old version to new. There’re a few API versioning techniques such as using URL, query string, header etc. Choose the right one and use it consistently across all services.
Since IP address per container is masqueraded by default, NVA can’t protect them.
REST.LI: Framework for RESTful API used by LinkedIn
Thrift: Framework for cross-language RPC
This is where service-mesh comes into play.
They are very well integrated with k8s
A few lines in yaml file and get deployed to the cluster
Routing based on IP+port#. It also has to consider node state.
If majority of services are responsible for the same thing such as logging, caching, authentication etc. It makes sense to offload it to GW.
There’re commercial or OSS products that support this scenario.
Azure App GW, Nginx, HA proxy, Traefik ( https://docs.traefik.io/) are good examples
OpenID Connect for consumer services, LDAP for enterprise
Fat gateway is an anti-pattern. Too much domain knowledge in GW becomes a blocker for fast deployment.That’s the mistake we made in SOA.
Gateway can be SPOF or perf bottleneck. That’s what happened at Pockemon GO lanuch event.
These are logical components in bigdata solution. They are optional.
In Hot path, data is coming into ingestion pipeline and processed real-time using stream analys and projected, visualized
In Cold path, data is stored in cold storage and processed as batch using hadoop, projected, visualized
Orchestrator manages the whole workflow
In Kappa arch, we don’t have batch processing. We process re-computation by using hot path.
Those are not only options but popular ones.
Some devices have enough resources, others are restricted
For restricted devices, we may have device GW to augment its capability
IoT device SDK enables devices to connect to backend
IoT Edge can run on devices or device GW to do aggregation, AI etc. Azure-ML, Functions, ASA is supported
Event Hub for non-device telemetry
Custom protocol GW for other protocols than AMQP, MQTT, HTTP
DPS manages registration, load balancing devices to IoT Hubs
Twin stores device metadata (firmware ver., protocol supported etc.), we need other store for device registry
State store is for last known state (on/off, normal/error, telemetry data such temperature)
Stream processor for hot path analysis (e.g. alert) and also telemetry
It will be sent to cold storage or advanced analytics
Actors maybe used for device lifecycle management or command/control
Solution UI is to visualize device, analytics etc.