In the world of big data we need to build services that will be able to collect massive data, save it and pass it to processing and analysis. However, building manageable, reliable services that are scalable and cost effective is not an easy task. The choice of eco-system, frameworks and programming language, as well as using solid engineering principles is also crucial for achieving this goal.
I will share our journey and insights from rebuilding a cloud service in Linux eco-system using Scala, Akka Actors and Aerospike DB, at the end of which we gained 10 folds improvement of server usage with a much lighter, stable and reliable system that handles tens of millions of requests per hour.
2. Agenda
• A little about Clicktale Core
• Challenges
• Guidelines
• Akka
• When to Act(or)
• Pool it together?
• Throughput vs. Latency
• Blocking I/O Execution Context
• Aerospike
• Data
• Server
• Client
2Confidential
6. 6Confidential
Read User State
Decide What to Do
Write New
User State
Write Session
Metadata
for {
state <- db.readUserState(userId)
result <- Future.sequence(
Array(
db.writeSessionMetadata(metadata),
db.writeUserState(Recording)
)
) if state == NotRecording
} yield Response
When to Actor(or)
Send Response to Client
7. 7Confidential
Read Configuration from Zookeeper
Connect to Rabbit Connect to Aerospike
Initialize Data Structures
Bind Web Service
Read Rules from DB
Futures
Actors
(FSM)
Akka Streams
(Graph)
When to Actor(or)
8. 8Confidential
• Mutable State with concurrent access
• State Machines
• Connections (Reconnections)
• Queues (Rabbit/Kafka)
• Reading and Spreading Configuration
• Throttling
When to Actor(or)
13. 13Confidential
Max message size -> Max aerospike record size
• Multiple Message per record
• Record updates -> Fragmentation
• Message per record -> Batch read
• Is it possible to reduce data size?
• Compression
• Protobuf/Avro
KnowYour Data
14. 14Confidential
• write-block-size
• Larger -> more fragmentation if write size is small
• max-write-cache (pending write blocks)
• post-write-queue
• Defragmentation
• Update of record causes fragmentation
• One defrag thread per device
• Hardware - Are all instances created equal?
KnowYour Server
15. 15Confidential
• Threading Model
• Does it use Async I/O?
• Completion handlers – Thread Pool? Event
Loop?
• Settings
• Retries
• Timeout
• Max Async Commands
• Max/Min thread pool size
• Batch Support
• Keep track of New Features
KnowYour Client
Challenges:
High Availability - any http 4xx or 5xx is visible by our customer's monitoring tool running on the browser
Low Latency
A response from server is required in order to start the recording process without delays.
In addition, every browser request to server with latency is perceived by our customers IT as affecting User Experience
Peaks - we have customers launching a web campaign, and all of sudden we have peaks, sometimes doubling the current traffic rate.
Data Variance - different web pages, different users, different events, different page view duration
Cost - reduce cost where possible - machines and storage
Guidelines we adhere to in order to answer the challenges:
Data
Aggregate data before sending to server (but not too much…)
Reduce data size in client as much as possible e.g. Compression
Chunking data (max request size)
Protect Client
Release client as soon as possible - even before saving data to DB
Client timeout - both at http level (Spray) and actor level (ask pattern timeout)
Try…Catch and Actor Supervision all the way
IO
IO must never interfere with client requests
Do everything async and as much as possible in parallel
Different execution context than the actor system's
Error Handling
Supervision on all actors
Try catch on request handling flow
Actors are very good when working with a mutable state and concurrent access, but what about the flow shown here?
A very common flow that consists of several async I/O operations (marked in green)
Is an Actor based solution suitable here?
Actors are not ideal when you need to do a sequence of I/O operations with decision logic in between
Requries a lot of boiler plate code to pass and handle messages from and to the actor
In this case, using Futures gives a more elegant simpler solution. A possible simplified solution with Futures is presented here.
Another example.
We first read configuration from Zookeeper, that stores the necessary connection strings to connecting to SQL DB, Rabbit and Aeropsike.
Only then we can initialize everything and bind the service to listen to incoming requests.
Again, I/O operations are async and are marked in green
What is the most suitable solution here?
This time the answer is not so clear. It is possible to do with futures, but Actors with Finite State Machines might also work here, since this is a state machine.
Another interesting approach is to use Akka Streams with Graph Api – we could see this flow as a stream that handles a single “init” message.
So when it is clear that Actor is our go to solution?
We saw that State Machines is a good choice, especially with FSM.
Wrapping a connection with an Actor is also a good idea, because you can manage reconnection (again, a state machine) and also buffer requests during disconnection (using stash).
Queues are also natural to actors, since Actors are a queue, and you could easily implement the connection handling mentioned above.
Keeping a process configuration and updating online configuration events – encapsulating a mutable state (configuration), and using Actor system event stream to notify configuration changes, where current configuration is an immutable object
Does using a pool of actors actually improve throughput?
As with all answers, it depends on your process and what your are doing.
In our case, we have an I/O intensive service, but from profiling of our service under pressure, we saw that there was a direct relation between I/O and CPU consumption.
CPU time was mostly spent on:
Processing and parsing http requests (Spray)
Aerospike Client Selector threads
IO Completion
So if the CPU is already working hard, and we are running on a quad core machine – is there any benefit to pool of actors?
Our performance tests shows no significant difference between the two options.
The higher the "throughput" value (default is 5) and the throughput-deadline-time (Default is negative for no deadline) - the better the throughput (you also reduce context switching of the dispatcher threads between actors), but it can cause starvation to other actors
Depends on your actors - if you have one main actor that handles all the traffic, and maybe other actors that run periodically - then it is best to put higher values to increase throughput on the expense of these actors.
But putting too high a value on throughput might cause latency on responses - so if we need to cap the maximum response time we need to balance it with throughput-deadline-time
Thread Pool settings
There is no good or wrong here. You need to take into account the number of cores, but also the number of threads other than the fork-join-executor that also use CPU.For example, in our service, there is also the Aerospike Thread pool being used for async command completion, and the async selector threads (usually the number of cores).
These threads also compete with the dispatcher, so you must test this.
Dedicated IO Thread Pool Execution Context
Always a good idea. Also neutralizes the affect of blocking IO on CPU threading – so helps with testing best CPU thread pool settings
The max size of your data will impact the required configuration for Aerospike.
It will also affect the way you write data to Aerospike – would you keep multiple messages in a list\map bin or need to split the message between several records
There is a tradeoff here – multiple records containing the message means multiple writes and multiple reads. Writing to the same record several time may cause fragmentation
Try to reduce data size as much as possible for higher throughput in both server handling requests and Aerospike
If data might be resent, need to handle duplicate detection – could use key already exists, or if using CDT – use map instead of list.
Things you find out doing stress tests on the DB:
Block Size -
affects write size
Depends on Data - small data - smaller block size is better
Ten folds scale difference
Write Queues – do you need to increase the default threshold?
Defragmentation
Updating existing records increases fragmentation (new block is allocated and record is copied on each update) and overloads the defrag thread
How to improve defrag thread
Cluster with more smaller instances better than Cluster with less stronger instances.
Machine Scope: Partitions - thread per partition
Direct-Attached SSD disks - not all instances were created equal
Need to run Aerospike Certification Tool (ACT) that measures the SSD write and read times and select the instances with the best SSD