How we broke Apache Ignite by adding persistence

How we broke Apache Ignite by adding
persistence
Stephen Darlington
16 December, 2019
2018 © GridGain Systems

2019 © GridGain Systems GridGain Company Confidential2
(spoiler: already fixed)

2019 © GridGain Systems GridGain Company Confidential
What is Ignite?
Distributed memory-centric storage
Combines the performance and scale of in-
memory computing together with the disk
durability and strong consistency in one system
Co-located Computations
Brings the computations to the servers where
the data actually resides, eliminating need to
move data over the network
Distributed Key-Value
Read, write and transact with
fast key-value APIs
Distributed SQL ACID Transactions Machine and Deep Learning
Horizontally, fault-tolerant distributed SQL
database that treats memory and disk as
active storage tiers
Supports distributed ACID transactions for
key-value as well as SQL operations
Set of simple, scalable and efficient tools that
allow building predictive machine learning
models without costly data transfers (ETL)

Apache Ignite In-Memory Computing Platform
Mainframe NoSQL HadoopIgnite Persistence
Persistent Layer
RDBMS
Machine and Deep Learning
EventsStreamingMessaging
Transactio
ns
SQLKey-Value
Service GridCompute Grid
Application Layer
Web SaaS SocialMobile IoT
In-Memory Data Store

Apache Ignite’s History
5
Data Grid Local Store
Transactional
Persistence
?

Circa 2011
6
Local Store
Transactional
Persistence
?Data Grid

Circa 2014
7
Transactional
Persistence
?Data Grid Local Store

Circa 2017
• Start time does not depend on the data volume
• Can store more data than memory
• Crash recovery
• Single in-memory & native persistence architecture

Ignite 2.0: What we wanted
“We will just save everything to disk”

Circa 2017
10
?Data Grid Local Store
Transactional
Persistence

Beginning: Durable Memory

• ARIES Architecture
• Page-based
• Write-ahead log (when persistence is enabled)
• Everything is off heap

• PK Index: how to replace a HashMap
• Concurrent B+ Tree: a well-known data structure
• Separate PK Index per each partition
• Compare key hash first
• Bonus: guaranteed iteration order in a hash map

Baseline Topology
16
• [16:21:01] Ignite node started OK (id=326bab44)
• [16:21:01] >>> Ignite cluster is not active (limited functionality available). Use control.(sh|bat)
script or IgniteCluster interface to activate.
• [16:21:01] Topology snapshot [ver=1, locNode=326bab44, servers=1, clients=0, state=INACTIVE, CPUs=8,
offheap=3.2GB, heap=3.6GB]
• [16:21:01] ^-- Baseline [id=11, size=3, online=1, offline=2]
• [16:21:01] ^-- 2 nodes left for auto-activation [6213b7af-23bb-4c8d-a045-157d7f2d7718, db969788-
fc01-41f4-a91c-c03f2d201f76]
• [16:21:19] Joining node doesn't have encryption data [node=89b6ef6c-1055-4678-bcfa-00fb222208ce]
• [16:21:19] ^-- 1 nodes left for auto-activation [6213b7af-23bb-4c8d-a045-157d7f2d7718]
• [16:21:37] Joining node doesn't have encryption data [node=dd55ff24-da61-42cd-bbaf-c7940fab07d3]
• [16:21:37] ^-- All baseline nodes are online, will start auto-activation

Disk. Predictable access speed
• Disks are slow (even NVMe)
• At peak load naïve implementation steps on it’s tail easily
• Sudden performance drops to 0

• So, we need to… make Ignite slower
• Throttle input load depending on
• How fast we produce “dirty” pages
• How fast we write to disk
• How free the Copy-On-Write buffer is

• Page cache: what can go wrong
• We already have one-page cache in Ignite (durable memory)
• OS-level page cache
• Effectively doubles the memory consumption

• Page cache: solution is Direct IO
• Available in Java 10, but we build on Java 8
• Need to have native/platform specific calls or Java-
dependent module

The future?
21
Data Grid Local Store
Transactional
Persistence
?

Questions?

More information
23
• Main landing page: https://ignite.apache.org
• Documentation: https://apacheignite.readme.io/docs
• Please complete our survey on how Apache Ignite should evolve:
https://docs.google.com/forms/d/e/1FAIpQLSdUveEVXer3lpkyiqfFw4175T
vZzGHUOS4snPfnkO0NDku0eQ/viewform
• Realtime data loading: https://www.imcsummit.org/2019/us/session/best-
practices-loading-real-time-data-distributed-systems-change-data-capture

Stephen Darlington
Senior Consultant, GridGain Systems
@sdarlington

How we broke Apache Ignite by adding persistence

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a How we broke Apache Ignite by adding persistence

Similar a How we broke Apache Ignite by adding persistence (20)

Último

Último (20)

How we broke Apache Ignite by adding persistence

Notas del editor