Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture

Jordi Puigsegur Figueras
https://www.linkedin.com/in/jordipuigsegur/
▪ Head of Solution Architecture
Hotelbeds
▪ Course Instructor
High Scale Distributed Systems
Universitat Oberta de Catalunya

1. Who we are
2. Hotelbeds Journey
3. Survival Principles

▪ Hotelbeds Group is a leading bedbank and a business-to-business
(B2B) provider to the global travel industry.
▪ In September 2016 Hotelbeds Group was acquired by Cinven and
the Canada Pension Plan Investment Board.
▪ In June 2017 Tourico Holidays became part of Hotelbeds Group
and in October GTA also became part of the Group.
▪ Hotelbeds Group offers travel providers access to a network of over
60,000 travel sellers from around 185 source markets globally.
▪ Travel sellers have access to over 170,000 hotels, +22,000 transfer
routes, +16,000 activities and 142,000 car rental products.
▪ The technology platform handles around 2 billion data requests
per day – with peaks of up to 2.5 billion and 40.000 request
per second – from users worldwide.
▪ +5000 employees worldwide. 210 offices globally. Biggest single
site is the head office in Palma de Mallorca where over
1500 people work.

• Apitude Cloud
• 1.200M request per day
• 20k request per second
• Distributed Availability
• Multi-region deployments
• 2.000M requests per day
• 40k request per second
• Transfers and Activities product
• Product update for content (FTP)
• 1M requests per day
• Cache pull service
• Proprietary format & Rules based
• Allow customers to scan inventory
• 10M requests per day
• API driven Product Distribution Strategy – Suite of APIs
• Focus on scalable and high performance platforms
• Ease of integration development driven experience
• 200 M requests per day
2008
2010
• Technical breakthrough in the market
• Accommodation product only
• Thousands of requests per day
XML2
AIF2
2002
XML1
2017
APItude
APItude
2015
2018
APItude

Three main initiatives will shape
Hotelbeds Technical evolution
2016 2017 2018
Distributed Availability:
Going Global
ATLAS+ Project:
Breaking the monolith
Apitude Migration
to cloud

▪ APITUDE is a new redesign of our APIs
▪ Live by end of 2015
▪ 30% faster than the existing API
▪ DX approach: simpler and based on new technologies
▪ Sets the ground for a modern microservice cloud-native architecture:
○ Cloud-ready services based on Spring Boot
○ Immutable deployments (rpm) and external configuration
○ New technology components: Redis, Spring Config Server
○ Focus on enabling automation
○ New “cloud-friendly” architectural patterns

▪ Big monolithic Oracle Database with most of
the company’s business logic inside.
▪ Some satellite Java services…
○ ... but all logic still in PL code
○ ~ 2500 tables, ~ 1.1 M LoC
○ Montly releases with full stop
of up to 1 h.
▪ The new Apitude API platform, is already live,
but it is still hosted on premise.
▪ Most hotel availability requests are solved by
the legacy XML 2 Platform (on premise).
▪ Apitude rollout is just beginning.

▪ We beginning to face serious scalability
issues.
▪ Our Oracle based platform is not going to
cope with the expected business growth.
▪ Vertical scalability is not an option …
we are already running on powerful Oracle
Exadata hardware.
▪ Some estimates only leave 18 months until
platform saturation.

▪ The main driver is to reduce latencies for
our globally distributed customer base.
▪ Availability requests are increasing
exponentially, therefore:
○ We need more flexibility to grow
and evolve new Apitude services
○ We need autoscaling to adapt to
varying loads
(day / week / seasonal)
▪ Cloud can also be a cost-saving driver

▪ Cloud migration strategy is mixed:
○ Migrating new cloud native components
○ Plus lift and shift of older ones
▪ Deployed in AWS - 1 region: Europe
▪ Based on IaaS deployments of binary immutable
rpms with external configuration
▪ Some managed services (ElastiCache & ELBs)
▪ Adjusting autoscaling and fine tuning takes time
▪ Good monitoring is crucial

▪ Project focused on extracting the core business logic inside our
big monolithic Oracle Database
▪ Migration of business logic to cloud-native microservices:
○ Full reengineering of backend services
○ No business involvement … transparent migration
4 Teams
50 Developers
9 Months
20 new Spring Boot services
70 % on Cloud

▪ Hybrid on-premise - cloud approach:
○ Madrid datacenter + 1 AWS region
○ Logic is moved to cloud microservices
○ Data is kept on Oracle DB (on-premise)
○ Prioritize use of cloud data
▪ Data replication on-premise - cloud becomes crucial
○ Use of Kafka (own deployment)
▪ PostgreSQL is the choice for microservices database
○ Managed RDS instances
○ Sometimes noSQL approach

AtlasDB
BOOKING BL
BOOKING
API
API
PostgreSQL
onpremise
onpremisecloud
AtlasDB
Example booking operations:
▪ Booking List
▪ Booking Detail

▪ Hotel Availability across three regions:
○ Europe
○ North America
○ Asia
▪ Global data replication using Kafka
▪ Customers are geographically
redirected by dynamic DNS.
(check Eric Janz talk this afternoon)
▪ Better latencies across the globe
▪ New options for growth
▪ Good monitoring becomes crucial

… for the microservices jungle!
▪ Standardization
▪ Decoupling
▪ Data replication
▪ Resilient designs
▪ Automation
▪ Microservice support ecosystem
▪ Governance

▪ We all know microservices are cool
▪ We all want to do microservices!!
▪ We all know the advantages of microservices
▪ But Microservice architectures
○ are complex
○ carry many hidden overheads
▪ In fact ...
You are going to build a distributed
system and distributed systems are hard!

▪ Programming Language: Java
▪ Parent poms with most relevant dependencies
▪ Some (not many) libraries, e.g. metrics
▪ Standardized service archetype (maven)
○ Ready to run in Hotelbeds ecosystem
○ Produces binary rpms (docker images soon!)
▪ REST APIs designed following similar principles
▪ Carefully chosen set of technology components
▪ Reference architectures on when and how to use
components and libraries
▪ Technology radar

▪ Decoupling is essential to achieve
the microservices goals
▪ Good decoupled architecture ...
○ Helps scale dev teams
○ High scalability & efficiency enabler
○ Supports future features naturally
▪ Independent deployments and life cycles for each Service
▪ The API is the only point of access of the service (REST endpoint, Kafka, …)
▪ Data is private: no database access from external components

▪ Importance of clean service boundaries
▪ One rule of thumb: changes shouldn’t involve several microservices
▪ Domain-Driven Design as a very useful set of tools:
○ Focuses on domain knowledge and its representation on code
○ Focus on Strategic patterns
○ Bounded contexts as the basis for microservice boundaries
▪ Beware!
○ Service boundaries are hard to define!
○ Easy to end up with a distributed monolith / microservice spaghetti / ...

▪ Data replication between services is crucial in
a hybrid cloud / multi-[region|cloud] environment
▪ Each entity is owned by a service
▪ All the other services access the owner service
via REST API or consuming its Kafka messages
▪ Kafka messages contain exactly the same entities
as the service REST API
▪ Kafka is our message broker and basic tool to replicate data between services
○ Scalability
○ Partial order guarantees
○ Kafka “mirror maker” for moving data across locations
(check Kafka talk by Isa and Alicia tomorrow!)

Resilience: "the ability of a system
to withstand changes in its
environment and still function"
Wikipedia
▪ We need to design with resilience
in mind
▪ Favor self-healing architectures
▪ Remember! We are dealing with
distributed systems
▪ Resilience patterns: check Uwe
Friedrichsen talks (slideshare)

▪ Protect your services for the unexpected
○ Overloads
○ Timeouts
○ Downstream errors
○ Datacenter failures
○ etc.
▪ Protect your services even if they are only internally
exposed
▪ Focus on protecting each service individually
▪ Let good system behaviour emerge from good service
level practices

▪ Hystrix library provides several very useful resilience patterns:
○ Circuit breaker
○ Load shedding
○ Timeouts
○ Fallbacks
○ Retries
DISTRIBUTION
3rd PARTY
INTERNAL
PRODUCT
Suppliers

* from: Coordination Avoidance in Database Systems
Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica
“Minimize coordination, or blocking communication between
concurrently executing operations, is key to maximizing
scalability, availability, and high performance.” *
▪ Allow services / instances / threads to keep
Working independently of its peers,
its dependencies and even its clients
▪ Favor push vs pull strategies
▪ Favor asynchronous vs synchronous
▪ Favor local caches vs complicated grid /
replicated caches
▪ The Reactive Manifesto

▪ Services publish entity changes to Kafka
▪ Client services can consume these
streams and keep a local memory replica
▪ Important for high transaccionality / low
latency services
▪ Kafka compaction guarantees that at
least one message per key is kept
▪ Every time an instance of the client
service spins ups can load the caches in
memory and keep listening for changes
▪ Each service instance is independent. No
communication between peers.
PRODUCT MASTER DATA
DISTRIBUTION

▪ Dedicated Delivery & Automation team
▪ Devops roles inside scrum teams
▪ Automated CI/CD pipelines based on GitHub Flow
▪ Infrastructure automation
▪ Infrastructure as a code
▪ Automated testing is key for continuous
delivery:
○ Unit testing
○ Integration testing
○ Smoke test & end 2 end testing using
framework based on TestNG

SERVICE
DISCOVERY
EXTERNAL
CONFIGURATION
LOAD
BALANCING
LOGGING
METRICS
Config
Server
AWS Elastic
Load Balancer

▪ Service catalogue based on Enterprise
Architect and own tools:
○ Architecture baselines
○ Ownership of services
○ Dependencies
○ Targets & Transitions
▪ Dedicated Information Architecture Team
▪ Clear Process for new Services provisioning
▪ Automation Integration: No new deployments
of deprecated components
▪ IT Cost Model Tool

▪ Focused on integration of the
three companies
▪ Reorganization into a product
based company
▪ Moving more business logic
into microservices
▪ Multi-cloud
▪ Containers
▪ Keep improving our platform
○ More resilient
○ More agile
○ Better TTM

Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture

Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture

Similar a Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture (20)

Último

Último (20)

Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture