Nationwide Building Society has invested £4.1 billion in technology and is creating 750 new digital roles. They need a "Speed Layer" to support increased mobile and digital activation, open banking, and enhanced customer propositions. The Speed Layer uses Kafka as an event hub, MongoDB as an operational data store, and stream processing to aggregate and enrich data. It provides pre-populated caches and introduces an event-based architecture. To ensure high resilience across two data centers, Nationwide uses independent Kafka, stream processing and MongoDB clusters in each rather than a stretched MongoDB cluster. Nationwide loaded 15 billion transactions into MongoDB by bucketing documents by account and month to improve performance for reads. They conducted proof-of-concepts to
2. Agenda
l
5 mins
Nationwide and its Big Investment in
Technology
Rob Jackson
l
5 mins Why we need a Speed Layer
l
5 mins What is the Speed Layer?
l
5 mins Challenge: Multi-Data Centre
l
5 mins Challenge: Data modelling for existing data Pete Cracknell
l
5 mins
Challenge: Introducing new technology and
patterns
l
10 mins Demo
QA Rob and Pete
3. Nationwide and its Big Investment in
Technology
• Nationwide is the world's largest building society as
well as one of the largest savings providers and the
second largest mortgages provider in the UK, with
around 15 million members.
• We were founded 135 years ago to help people save
and buy homes of their own, and our focus on
building society is as important to today as it was
then.
• With an investment in technology of £4.1bn, we’re
building our technology and digital capabilities for the
future so we’re primed for the next generation of
digital innovation, and creating around 750 new
digital, data and technology roles in London.”
• The Speed Layer plays an important role in that
investment.
4. Why do we need a Speed Layer?
Increase Mobile and
Digital
Activation
Increased
competition
Open Banking
Enhanced
member
propositions
`
IT
Strategy
What does it provide?
Pre-populated
cache of SOR data
Introduces an event
based architecture
Domain specific
caches
Availability and
scaling
Digital Disruption…
5. What is the Speed Layer?
How does it work?
1. Applications push data (events) into Kafka
2. Stream processing to aggregate and enrich
3. Materialise into MongoDB for consumption
How will we use it?
1. Event Driven Pattern
2. Domain Specific data sets
3. Enterprise level data stores
Meaning…
1. Kafka becomes our Enterprise Event Hub
2. MongoDB becomes our Operational Data Store
6. Challenge: Multi-Data centre
Objective / Challenges
• We want our applications to run active/active in our 2 data centres
• Existing database/mainframes often run active/passive
• High resilience requirements
• Ease of DR testing
• A stretched MongoDB Cluster requires a 3rd voting node
Option 1 – A Stretched MongoDB Cluster Option 2 – Independent MongoDB Clusters in each Data Centre
• MongoDB takes care of replicating data across our 2 data centres
• Applications are automatically routed to active nodes, but…
• Results in active/passive on Kafka
• We didn’t have anywhere for the 3rd voting node
• Cross centre capability tied to materialisation layer
• Kafka topics asynchronously mirrored across Data Centres
• Independent Kafka, stream processing and MongoDB clusters in each data
centre
• Keeps both data centres as similar as possible, easing DR testing and processes
Summary
• Option 2 was chosen to keep DR testing as simple as possible and to avoid the
need for a 3rd voting node
7. Challenge: Data modelling for existing data
Challenge
• 15 billion transactions to go in MongoDB
• Significant insert/updates
• Performance priority is reads by AccountId and Month
Flat collection Bucketing
• Document for every transaction
• Simple to insert and update
• Queries are simple
• Indexes for AccountId and Month
• Predicted index size 280 GB
• Document for every AccountId and Month
• Insert and updates are more complex
• Queries are simple
• Indexes for AccountId and Month
• Predicted index size 30 GB
1
Account
AccountId
Balance
…
Transaction
TransactionId
Month
AccountId, …
AccountId
Balance
Month
Transaction
TransactionId
…
AccountId
Balance
Month
Transactions[]
TransactionId
…
8. Challenge: Introducing new technology and patterns
General approach
Themes:
• High bar for resilience
• Understand potential pain points
How we did it:
• Proof of Concept in Architecture Proving Team
• Collaborated with MongoDB, Confluent and IBM
Used 2 approaches -
Technology and pattern proof of concept DB2 for Z CDC evaluation
• Cloud based with independent equipment
• Fake data
• Descoped SOR
• Kafka Streams and Spark comparison
• UI to illustrate key messages
We learnt:
• Streams processing is non-trivial
• Streams processing is suited to EDA only
• Unhappy paths are time consuming and complex
• 2 weeks on IBM Z mainframes in Montpellier
• Simulated SOR with CDC and Kafka
• Workload generator with dummy schemas and data
• CDC product selection contribution
We learnt:
• Full load: peaked at 91k TPS with 0.26% CPU
• Change processing: 10 minutes of 45k TPS created 10 minute backlog
• 280 column table caused issues
• Single partition only