Sergey Zhemzhitsky, CTO of CleverDATA (a division of LANIT, a leading system integrator in Russia), presented via live webinar how his team developed their in-house tech stack - which includes Hadoop, NoSQL, and a variety of Data Mining and Big Data Analytics tools.
Watch the video! http://pages.aerospike.com/32515----Webinar-Compare-RealTime-NoSQL_32515----Webinar-Compare-RealTime-NoSQL-Watch.html
Learn about:
• Architecture of the CleverDATA Platform including DMP capabilities
• Results of evaluations and performance testing on Aerospike, MongoDB, and Redis
• Critical requirements for operating DSP, SSP and DMP for high performance, low-latency scaling
• Do’s and Don’ts of applying this design pattern to your real-time application
About the Speaker:
Sergey Zhemzhitsky started in IT sphere 10 years ago as a developer and rose through the ranks to joined CleverDATA in 2014 as Chief Technical Officer.
About CleverDATA:
CleverDATA specializes in developing and providing an in-house platform 1DMP.RU as a cloud service. The platform enables customers to exchange, process and accumulate big amounts of various data. The company provides IT consulting services, develops private data management platform and implements solutions involving AdTech technologies, unstructured big data mining, digital intelligence and operational analytics, machine learning for marketing automation, risk management, real-time web analytics, RTB purposes and customer engagement and retention.
About Aerospike
Aerospike is the world’s fastest database – the system of engagement – powering a new class of real-time, context-driven applications that personalize the consumer experience across the Internet. Developers use Aerospike, an open-source, flash-optimized, in-memory NoSQL key-value store for caching, as a user profile or context store, and to simplify scaling with smaller clusters and the price/performance of flash. Recognized by industry analysts as a visionary and leader, Aerospike powers 13 of the top 26 real-time bidding platforms including AppNexus and developers are rapidly taking advantage of the Startup Special to scale their business. Download the open source Aerospike Community Edition at www.aerospike.com and follow @aerospikedb.
2. cleverdata.ru | info@cleverdata.ru
International market
business development
since 2012
One of three leading IT companies in Russia
43 branches in Russia and abroad
+5500 employees
100K projects for 10K customers
Data management innovative
platform (Data Exchange Service)
Cloud Service
In-house development
Internet advertising solutions
Data Management Platforms
Customers Base Management
Web Analytics
Marketing automation
Big Data
Data Mining
Digital Intelligence
Operational Intelligence
Low Latency and NoSQL
Cloud Computing
9. TRACKING DATA
cleverdata.ru | info@cleverdata.ru
Data Management Platform (DMP)
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNERS’ DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
WHATEVER ELSE DATA
11. TRACKING DATA
cleverdata.ru | info@cleverdata.ru
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNER’S DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
DMP in the RTB ecosystem
DMP
37. cleverdata.ru | info@cleverdata.ru
• Do use non-blocking I/O;
• Do computations locally;
• Be lazy. Don’t do more than needed;
• Don’t afraid of using other’s good ideas;
• Don’t trust anyone.
Do’s & Don’ts
40. To get started now with Aerospike, visit:
aerospike.com/get-started
Tweet us at @Aerospikedb
Notas del editor
Just a few words about what CleverData is. It is a pretty young company which is joint of CleverLeaf, headquartered in London and Lanit – one of the biggest IT solutions company in Russia. CleverLeaf brough BigData, Data Analysis, Data Mining, and Data Processing development directions to Lanit.
We are three years old since the founding of CleverLeaf, and CleverLeaf legal entity is free to contact to by anyone who prefers to deal with European legal entities instead of Russian ones.
Moreover, we are the partners of Aerospike in Russia and EMEA regions, and provide professional services of setting up, tuning and developing solutions based on the Aerospike database as well as other technologies like hadoop stack, spark, elastic search, etc.
Among our customers there are the biggest telecom, media and retail banking companies which are happy with the data management platforms we built and provided them with, with the Aerospike as the core of real-time data access layer.
Recently we have launched 1DMP cloud platform to provide online ad market players and whoever else with the ability of data exchanging and data processing so that companies which own, for example, web logs, user profiles, tracking pixel logs or whatever else data, would be able to transform and monetize their data by means of our platform. The service is open and free to test right now with the links located at the end of the presentation.
The agenda for today is the following…
First of all there will be a brief introduction into what the RTB is…
Then we’ll talk about the challenges RTB comes with… Service Level Agreements of every RTB participant should follow.
Then I will try to describe the difficulties of choosing an appropriate software which will be a part of the final solution.
Some of our testing results will be shown and we’ll also discuss the importance of testing all the components in the target environment and on the target hardware and software.
And finally there will be some do’s and don’ts we have learned during the development of what we called “User Profile Enrichment Service” which uses 3rd party data to build a complete and full profile of the online user visiting web pages in order to allow the advertisers to buy impressions efficiently.
Well, when anybody starts dealing with RTB, he will be definitely discouraged by the number of 3-letter abbreviations, RTB introduces.
There are so many of them, that the most difficult thing is to remember what they mean.
The most important abbreviations are: RTB, DSP, SSP, DMP. So let’s try to decipher these acronyms…
RTB – is the real time bidding.
When dealing with online ad there are a number of roles or parties.
Some of them create content and try to make some money on it, for instance, by means of placing some advertisements within this content. Such a party is called publisher.
There are also those, who would like to advertise some goods or services. And such participants are called advertisers.
And, finally there are systems where the publishers and the advertisers are met, such as ad networks or ad exchanges and the last ones are often responsible for real-time auctions to sell the current impression of the given visitor.
The real time bidding works in the following way: when anyone of us visits some web page, there is a special javascript code, invoked by this web page, that starts an auction. Advertisers make bids on the current visitor. The bid with the highest price wins and the corresponding advertiser notified to be a winner and his price lowers down to the second price needed to still stay a winner.
Notice that interests of publishers and advertisers are opposite.
Publishers are interested in making more money by selling less inventory (inventory in this case is a number of advertisements, or amount of ad space, a publisher has available to sell).
Advertisers want to buy more inventory (that means more ad places, actions, impressions) spending less money.
So let’s see how this conflict of interests is solved…
There are special software systems which work on the side of advertisers. These systems are called “Demand Side Platforms” and are responsible for spending advertisers’ marketing budget in the most efficient way.
On the opposite corner there are systems which are called “Supply Side Platforms”. These are working on the side of publishers and are responsible for selling the inventory for as high prices as possible.
So, how the advertiser should know which impressions to buy and which do not to.
Well, here is where the “Data Management Platforms” come into play. These platforms are responsible for gathering information about our interests by means of different ways and techniques, using data sources, like 3rd party data, partners’ data, web logs, click streams and so on and so forth.
You may notice that nowadays the web pages are rendered pretty fast and the process of ad displaying should not harm the speed of rendering of the web page and should not harm users experience with this page too
In order to understand the challenges developers of RTB systems are faced with, here is the slide that demonstrates performance requirements for these systems.
There is only 100 milliseconds to complete the ad auction for our visit.
Supply Side Platforms have something between 30-50 milliseconds to ask all the Demand Side Platforms for how much they bid for the current visitor.
Demand Side Platform should decide whether it is interested in the given user or not.
There must be some criteria to decide to bid on the current visitor regarding his interests, demands or wishes, or not to bid.
As you already know such information is located in the Data Management Platform, which has only 10-20 milliseconds to find the current web page visitor profile.
The load the RTB systems should be able to deal with are tenth and hundreds of thousands of requests per second.
Well, what is necessary to do to make DMP so fast, so during the time of rendering of the web page it would be possible to obtain as complete user profile as possible?
First of all, we should make DMP respond with the user profile with a pretty low latency.
Then, it is necessary to make DMP respond in 10-20 milliseconds even if DMP is slow enough and is not capable to serve all the incoming requests.
And finally, we have to obtain user profiles from everybody and every system who knows anything regarding the current user, for example, his interests, geolocation, gender, age and so on). It may be partner systems, 3rd party data, tracking data or any other data.
Something like the desired behavior is displayed on the slide.
If any of the requests to an external DMP is not served within the required 20 milliseconds – we have to just abort it, and stop processing it.
How to achieve such kind of behavior? Well, probably, we have to setup near the DMP something fast enough and probably in-memory to make the DMP to respond with the user profiles within the required time. I suppose it may be one of the NoSQL databases, available on the market.
Next,…, we have to implement an application, that queries this database for user profiles.
And finally, it is necessary to make the mentioned application get user profiles from the external systems as fast as possible.
The first issue, we faced with, is choosing NoSQL database.
Choice criteria for the scalable and high available systems we decided to follow are:
Linear scalability – because the amount of data tends to grow constantly, and user tracking, usually, is done with the help of cookies which may live quite a short amount of time.
Out of the box sharding is required to achieve the previously mentioned liner scalability.
The storage must be distributed by design. We don’t like to scale it manually.
Data redundancy and replication are required to provide high availability.
5. And,…, the low latency, we discussed earlier is necessary to be able to respond within 10-20 milliseconds which we have to decide whether we are interested in the current user or not.
When we started to choose the NoSQL database, we were a little bit puzzled, and were looking like the beast from the slide.
There are a lot of solutions. Only nosql-databases.org lists about 150 of them. What to choose from all of them to be able to follow the SLAs? It will be hardly possible to test all of them in the near-production environments.
So, in the first place we decided to pay attention to the functional requirements of the databases as well as to their users, community and customers, and then to give these databases a try to understand their performance.
So, lets write out the desired functional requirements the NoSQL database should have, and look at what does fit our needs and what does not.
All of the solutions are able to scale one way or another, except for Redis, that has already introduced Redis Cluster no so long ago. Unfortunately by the time of our testing Redis Cluster had not been declared yet.
All of the solutions have support of sharding, except for Redis. We had to shard it manually or had to find any 3rd party proxy applications which do the sharding for us. However we decided to try Redis because, from our experience, it promised to be very fast.
Regarding the replication and data redundancy – all of the solutions have it. Redis has master-slave replication.
Regarding the single point of failure – Redis was given, plus-minus because to make the Master-Slave replication work as desired, it is necessary to install 3rd party software, like Sentinel, that had quite a lot of bugs during the testing and that is deprecated now, or we have to install a Load-Balancer. Anyway the client application will detect that master node disappeared, but we would like this process to be transparent.
Maintainability is all about software installation, configuration, maintenance during its life cycle. Also it is the amount of efforts required to support this software.
Cassandra got “plus-minus” because it is a little bit harder to install and maintain than the others
Mongo got +- because it is difficult to install it in the correct way. … at least no so easy as some of the others. Also it requires more hardware to be set up, because it has a lot of processes of different roles like config services, replica nodes, routing services, etc.
Redis got “-” because by the moment we had to scale it manually.
And, regarding the monitoring tools – pluses got those solutions, which have graphical user interfaces. Honestly speaking, it is dubious advantage, although it is good to have while we start using some of the databases for the first time.
Before doing any tests we decided to learn whether there are any official and publicly available results of performance tests of chosen NoSQL databases.
So, we found the following paper with the graphs on the slide.
Lets look at what we have. These are graphs which show the dependency between the latency and throughput.
The only thing we can retrieve from these graphs is that the cassandra is mostly write-heavy database, mongodb – read-heavy database.
If we look at the right part of the graph we may notice that for cassandra increased read throughput means increased latency. For mongo the write latency degrades if the throughput increases.
It is necessary to say that we didn’t aim to measure the databases themselves, but we’d like to understand how the whole solution will work.
Because of that we didn’t test the databases separately. If anyone wants to do it there is yahoo cloud system benchmark that already have drivers for the most popular NoSQL databases.
Let’s decide what we are going to measure and how we will do it.
Each of the tests run in the following way: we did 5 trials, for 3 minutes each. Then from all of the trials we got the trial with the best results.
The size of every message was 512 bytes, like the average size of the typical user profile in our case.
We generated 20 millions of records and loaded them into the databases.
After the messages had been loaded we were waiting for all the database processes of data replication to be completed. And just after that we started testing.
Forgot to mention why nginx is here. The developed Enrichment Service uses OpenRTB protocol to communicate with the outside world.
We were measuring nginx, aerospike, redis, mongodb.
The load was generated by means of wrk utility.
To monitor the memory, cpu and network utilization we were using nmon and nmon-analyzer utilities.
The hardware setup was the following: Intel Core i7, Quad Core, 48 gigs of RAM, 1 gigabit NICs.
We also have to mention that it always necessary to read the small print. In our case the hosting provider guaranteed the network speed to be only 200 megabits per second, although all the boxes have 1 Gbit NICs.
First of all lets measure nginx to look at what it is able to show in our environment.
Here is how the components were deployed. The first box has wrk installed, another box has nginx without any plugins or logging enabled.
Wrk sends requests to nginx as fast as possible.
The results of this test will be our base line to compare other results with.
Here are the numbers shown by nginx.
The percentiles are the following:
50th percentile of latency is 1.35 ms
75th percentile of latency is 1.46 ms
99th percentile of latency is 1.90 ms
The maximum throughput achieved was about 69000 of requests per second.
The CPU utilization of the nginx host was near 30%
CPU utilization of the WRK-hosts during the tests was no more than 35%
Next we tested Redis.
It was deployed in the configuration displayed on the slide.
On every server there were 2 master and 2 slave processes, which were bound to the corresponding core of the quad-core CPUs.
Slave nodes of the corresponding masters lived on the different boxes.
The records were sharded by the key.
Nginx used 6 upstream configurations to route the requests to an appropriate instances of Redis.
Here are the Redis results.
The number of handled requests decreased because additional network hop appeared between the nginx and redis.
The network interface on the box that hosted nginx seems to be a bottleneck.
The number of network packets it was able to handle was near 70-75 thousands per second.
Mongodb was deployed in the following way: each of our servers hosted one config service, and three services represented a replica set, and joined into the corresponding shards.
Generally speaking the correct installation of mongo cluster is pretty unfriendly - we have to manage a lot of mongod processes, which have to be maintained, monitored, and so on and so forth.
Mongos process was installed on the node which hosted nginx. Communication between nginx and mongos was established by means of the local network interface. We decided to use the local interface because we noticed that the main interface has some limitations regarding the number of network packets it is able to handle.
We used lua-resty-mongol nginx plugin to connect nginx to mongo.
Here are the results.
They are not so impressive as we expected them to be.
We suppose that such results occurred because we were using lua, also the communication with mongo was thought the master node of the replica set. We also tried to use slaveOk option, but it didn’t lead us to any improvement in performance.
It should be noted that the CPU was mostly used by mongos daemon, but not nginx.
The CPU was utilized by mongos by about 45%, and by nginx by about 25%.
We decided to show the local network interface here, because all the traffic between nginx and mongos was coming through it.
Lets start testing aerospike.
We developed a custom nginx plugin to connect nginx to aerospike.
Aerospike was deployed on the three nodes, with the replication factor equal to two.
And when we started testing, we were very upset by the performance shown.
We were surprised by the results and decided to puzzle out the source of this issue.
There was nothing special with the network (with such a throughput).
We started doing more tests. During the every test we increased the number of nginx worker processes.
Although throughput was increasing, there was no any affect on the 99th percentile of latency.
In general the latency results were not quite stable while increasing the number of nginx workers.
So, we realized, that we were trying to make nginx, which uses non-blocking I/O, work perfectly with the blocking aerospike client library.
This means that our blocking source code that was based on blocking aeropiske c client, killed the nginx performance completely.
Ok, there are some good news. Aerospike has a lot of client libraries for all the occasions.
Among all the other client libraries there is libevent-based non-blocking client.
We implemented a quite simple http-server based on libevent and libevhtp, that is a lib to implement multithreaded http servers based on libevent
The final results were not long in coming and the response time returned back to expected.
The network is the bottleneck once again.
And here are the results we’ve got, taking into account all the constraints and limitations we had.
Nginx column is the baseline we oriented to. The performance of the non-blocking aerospike client is comparable to the performance of redis.
Mongo performance is quite good, but we suppose that we did not learn to cook it to achieve the results similar to redis and aerospike.
It is also not so easy in maintening comparing to other databases.
It is worth noting that we were not able to get more that 70-75K of network packets per second with our hosting provider, so we were theoretically limited by the 70-75K of 512-bytes messages.
Anyway the results shown by non-blocking aerospike client and redis were pretty good.
What else can we add regarding Aerospike?
Probably, the main thing is that aerospike was open sourced not so long ago. It uses SSD disks as raw devises by means of linux direct I/O API to achieve its incredible performance while keeping costs of platform maintenance and ownership quite low.
Its client libraries know about the cluster topology, so only one network hop is necessary to obtain the required data.
And last, but not least is that the aerospike supports secondary indexes which work really fast comparing to other databases’ secondary indexes which are maintained by means of slow map-reduce procedures.
So, what components our enrichment service is built of?
The application is pretty simple. Based on libevent and libevhtp, it consists of the core modules to bootstrap, initialize and configure the main internal data structures, service modules to periodically provide the application with the external data (like prices), handler modules to access externally and internally provided “small” data management platforms and to aggregate the corresponding replies and send it back to the client.
The ultimate performance and hardware utilization was achieved by choosing as you may remember the asynchronous nature of the application rather than traditional “thread-per-connection” model. All the external calls were running in parallel while the application was using one thread per CPU core to handle the incoming http requests and one additional thread for the internal purposes of the application. Request-handling threads were absolutely independent from each other, so there was no need to communicate between them, and the application (with the high-performance in mind) was running not only asynchronously, but completely lock-free.
To be even more efficient we used the memory pools, provided by the excellent talloc library. It increases the performance of the memory management functions by allocating memory chunks of an appropriate size once and then reusing them, instead of worrying about memory deallocations and memory leaks, which are quite common issues in C applications.
We also used
zlog library for logging
protobuf – as wire protocol between internal services and our enrichment service
Glib – for data structures and configuration of the application
json-glib extension – to parse OpenRTB json requests
graphite – to monitor and visualize the performance of the application.
To prevent the usage of thread synchronization primitives we made master and worker threads communicate by means of socketpair thought the unix domain sockets.
And here is what we learned during the development of our service.
Do not use blocking I/O if possible. If you have to – just isolate this functionality into some external service and use non-blocking I/O to communicate with it.
Everything that can be done locally without network communications should be done locally. When we were implementing our graphite module for the first time, we sent to graphite every metric we obtained. Imaging now that we have 10000 requests per second, and every request involves gathering from 15 to 20 metrics. That will be about from 150.000 to 200.000 of requests per second. So we started aggregating metrics locally and flushed them periodically to the graphite.
Always be lazy. We can progress only in this way . Don’t do more than needed. Almost 40% of the source code is never used.
Use other’s good ideas. While implementing the enrichment service we used the module and plugin system very similar to ones of nginx.
And never ever trust anyone. If you have to benchmark something, use your environment, requirements, software and technologies which you plan to be in the final solution.
So, that is all.
Will be glad to answer all your questions.