Más contenido relacionado La actualidad más candente (20) Similar a Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey Zagrebin, Ververica (20) Más de Flink Forward (20) Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey Zagrebin, Ververica1. © 2019 Ververica
Automatic State Cleanup in Apache Flink
Deep Dive into State Time-To-Live (TTL)
Andrey Zagrebin, Software Engineer and Apache Flink Committer
Flink Forward Europe 2019
2. © 2019 Ververica
Agenda for the talk
• Assumptions about the audience
2
• State TTL feature
─ Why?
─ What is it?
─ How to use it?
• Tech deep dive
─ General idea: State Wrappers
─ Background Cleanup
• Concurrent background process
• Incremental Heap Backend cleanup
• TTL Compaction filter for RocksDB
• Future roadmap & Useful links
3. © 2019 Ververica
Assumptions about the audience
• Familiar with Apache Flink and its Keyed Stateful Processing
3
• OR think of the Flink State as a local KV store with a single-threaded
access
• Two types of Flink State storage backed by
─ in-memory Java object map (heap, non-serialized)
─ RocksDB embedded KV store (native memory + local drive, serialized)
5. © 2019 Ververica
State TTL feature: Motivation
5
• Save space:
do not store what is not used
Implement some cleanup
on the Application level
OR …
Make Flink
take care about cleanup
under the hood
• Data privacy:
access for limited amount of time
e.g. using Flink Timers
which are separate state
Trade off: storage size implications!
Clean
up?
6. © 2019 Ververica
State TTL feature: Workflow
6
KEY VALUE Flink
State
Start
TTL timer
KEY VALUE
Read value
Time
Read value
Flink
State
Key/Value
Unexpired
… Use State ...
Expires Expired
… Forget State ...
Flink purges
the state
/ Update
Re-
Create
7. © 2019 Ververica
StateTtlConfig ttlConfig = StateTtlConfig
7
State TTL feature: Example
// Configure state TTL
// Set TTL
// When to restart TTL? OnCreateAndWrite or OnWriteAndRead
// Get expired state if still there?
// YES for cached, NO for GDPR
// Create state as usual
.newBuilder(Time.days(1))
.setUpdateType(UpdateType.OnCreateAndWrite )
.setStateVisibility (NeverReturnExpired )
.build();
ValueStateDescriptor <Long> lastLogin =
new
ValueStateDescriptor <>("login",Long.class); // Enable TTL
// Use unexpired
// Oops.. expired, not there anymore
lastUserLogin.enableTimeToLive (ttlConfig);
lastUserLogin.update(value); // Create/update for key
// do something during the day
lastUserLogin.get(); // -> value
// do nothing for more than a day
lastUserLogin.get(); // -> null
* For collections (List or Map), TTL applies per entry level
.cleanupIncrementally (10, false) // Activate automatic background cleanup
9. © 2019 Ververica
9
General idea: State Wrappers
KEY
USER
VALUE
Flink
State
Works with
TTL
State
Wrapper
TTL
State
Serializer
User
State
Serializer
TS
KEY
TTL STATE
USER
VALUE
TS
User State TTL Wrapper Normal Flink State
10. © 2019 Ververica
10
Background Cleanup
TTL
State
Single-Threaded
Check
Expired?
Drop
Needs synchronization!
● Complicated
● Performance implications
Clean
up?
11. © 2019 Ververica
11
Incremental Heap Backend cleanup (since Flink 1.8)
Single-Threaded
Check
Expired?
Drop
Clean up
...
….
...
Heap State
Global State Iterator
Global State Iterator
KEY VALUE
KEY VALUE
KEY VALUE
Global State Iterator
KEY VALUE
Global State Iterator
KEY VALUE
All state entries are periodically
cleaned up
IF the state is being accessed
12. © 2019 Ververica
RocksDB
TTL Compaction filter for RocksDB (since Flink 1.8)
Memtable
Immutable
SSTable 1
Flush to disk
Immutable
SSTable 2
Compacted
SSTable
Compaction
Updates
Flink TTL
Compaction
Filter
state
state
...
Iterates
Applied
per
entry
JVM Native C++
JNI
Ask for
current time
Configures
Clean up
FRocksDB
Check
Expired?
Drop
12
13. © 2019 Ververica
Future roadmap
• Event time support: FLINK-12005
• Support queryable state with TTL
• Flink Timer-based cleanup strategy
• FRocksDB -> RocksDB + Flink extensions (WIP)
13
14. © 2019 Ververica
Useful links
• The latest blogpost about the feature up to Flink 1.8
• FLIP-25: TTL design discussion
• JIRA Issue FLINK-3089
14
• User documentation for the State TTL