SlideShare una empresa de Scribd logo
1 de 40
WEBINAR:
Comparing NoSQL Databases
for Real-time Bidding
Presented by
Sergey Zhemzhitsky,
CTO of CleverDATA
cleverdata.ru | info@cleverdata.ru
International market
business development
since 2012
One of three leading IT companies in Russia
43 branches in Russia and abroad
+5500 employees
100K projects for 10K customers
Data management innovative
platform (Data Exchange Service)
Cloud Service
In-house development
Internet advertising solutions
Data Management Platforms
Customers Base Management
Web Analytics
Marketing automation
Big Data
Data Mining
Digital Intelligence
Operational Intelligence
Low Latency and NoSQL
Cloud Computing
cleverdata.ru | info@cleverdata.ru
Agenda
• RTB intro;
• Challenges;
• Choice difficulties;
• Results;
• Do’s and Don’ts.
cleverdata.ru | info@cleverdata.ru
RTB intro
cleverdata.ru | info@cleverdata.ru
Real Time Bidding (RTB)
publishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
advertisers
cleverdata.ru | info@cleverdata.ru
How RTB works
publishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
advertisers
cleverdata.ru | info@cleverdata.ru
Demand Side Platform (DSP)
advertiserspublishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
D
S
P
cleverdata.ru | info@cleverdata.ru
Supply Side Platform (SSP)
publishers
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
AD NETWORK
advertisers
S
S
P
TRACKING DATA
cleverdata.ru | info@cleverdata.ru
Data Management Platform (DMP)
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNERS’ DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
WHATEVER ELSE DATA
cleverdata.ru | info@cleverdata.ru
RTB challenges
visitors publishers
advertisers100ms 20 ms
10.000+ rps
TRACKING DATA
cleverdata.ru | info@cleverdata.ru
publishers
COOKIE SYNCs
ACCESS LOGS
PARTNER’S DATA
3rd PARTY DATA
CLICK STREAMS
advertisers
S
S
P
D
S
P
DMP in the RTB ecosystem
DMP
cleverdata.ru | info@cleverdata.ru
Wishes…
cleverdata.ru | info@cleverdata.ru
Choice criteria
• Linear scalability;
• Sharding “out of the box”;
• Distributed by design;
• Data redundancy & replication;
• Low latency.
cleverdata.ru | info@cleverdata.ru
Which NoSQL … again
cleverdata.ru | info@cleverdata.ru
Aerospike Cassandra Mongo Redis
Scalability
Sharding
Redundancy
Latency
SPoF
Maintainability
Monitoring tools
Features
cleverdata.ru | info@cleverdata.ru
Public results
cleverdata.ru | info@cleverdata.ru
nginx 1.2.x
ngx-http-redis
lua-nginx-module
lua-resty-mongol
ngx-aerospike
aerospike 2.x
redis 2.6.x
mongodb 2.4.x
wrk
iperf
nmon
• Intel® Core™ i7-920 Quad-Core
• 48 GB RAM
• 1 Gbit/s NICs (*)
• 20 * 106 msgs
• 512 bytes msg
* 200 Mbit/s guaranteed
Tools & Materials
cleverdata.ru | info@cleverdata.ru
• No logging;
• No plugins;
• CPU Affinity.
NGINX
WRK
Measuring Nginx
cleverdata.ru | info@cleverdata.ru
Nginx results. Base line
L., ms., 50% L., ms., 75% L., ms., 90% L., ms., 99% T., r/s
1.35 1.46 1.64 1.90 68851
cleverdata.ru | info@cleverdata.ru
Measuring Redis
NGINX
WRK
Node 1
Shard 1
Shard 2
Slave 3
Slave 4
Node 2
Shard 3
Shard 4
Slave 5
Slave 6
Node 3
Shard 5
Shard 6
Slave 2
Slave 1
cleverdata.ru | info@cleverdata.ru
Redis results
L., ms., 50% L., ms., 75% L., ms., 90% L., ms., 99% T., r/s
2.68 3.10 3.25 3.90 34769
cleverdata.ru | info@cleverdata.ru
Redis results
cleverdata.ru | info@cleverdata.ru
Measuring Mongo
NGINX :: MONGOS
WRK
Node 1
mongod :: cfg
mongod :: repl
mongod :: repl
mongod :: repl
Node 2
mongod :: cfg
mongod :: repl
mongod :: repl
mongod :: repl
Node 3
mongod :: cfg
mongod :: repl
mongod :: repl
mongod :: repl
Shard 1
Replica Set 1
Shard 2
Replica Set 2
Shard 3
Replica Set 3
cleverdata.ru | info@cleverdata.ru
Mongo results
L., ms., 50% L., ms., 75% L., ms., 90% L., ms., 99% T., r/s
6.70 8.22 10.22 15.46 14220
cleverdata.ru | info@cleverdata.ru
Mongo results
cleverdata.ru | info@cleverdata.ru
Node 1
Chunk 1
Replica 2
Node 2
Chunk 2
Replica 3
Node 3
Chunk 3
Replica 1
NGINX
WRK
Measuring Aerospike
cleverdata.ru | info@cleverdata.ru
L., ms., 50% L., ms., 75% L., ms., 90% L., ms., 99% T., r/s
8.93 14.99 26.83 106.48 3402
Aerospike results
cleverdata.ru | info@cleverdata.ru
Aerospike results
cleverdata.ru | info@cleverdata.ru
Wks L, 50% L, 75% L, 90% L, 99% T, r/s Ngx, CPU Ngx, p/s As, CPU As, p/s
4 8.22 10.22 15.46 97.3 3402 7 8 K 1 1 K
8 7.89 15.16 92.4 89.8 6028 10 13 K 1 2 K
12 4.76 8.74 103.1 121.6 10233 15 20 K 2 3 K
16 3.91 6.21 99.6 111.3 13178 22 26 K 3 6 K
24 2.13 2.87 4.68 76.81 25744 21 60 K 5 10 K
32 2.01 2.60 4.42 81.29 28925 25 70 K 6 11 K
64 2.54 3.66 112.4 118.1 26468 27 70 K 6 11 K
… more Aerospike results
cleverdata.ru | info@cleverdata.ru
nginx
+ blocking I/O
How to shoot yourself in the foot…
cleverdata.ru | info@cleverdata.ru
Libevent to the rescue (2nd attempt)
Node 1
Chunk 1
Replica 2
Node 2
Chunk 2
Replica 3
Node 3
Chunk 3
Replica 1
LIBEVENT :: LIBEVHTP
WRK
cleverdata.ru | info@cleverdata.ru
Finally…
L., ms., 50% L., ms., 75% L., ms., 90% L., ms., 99% T., r/s
2.64 3.09 3.27 3.95 35746
cleverdata.ru | info@cleverdata.ru
Finally…
cleverdata.ru | info@cleverdata.ru
Nginx Redis Mongo Aerospike, b Aerospike, nb
Latency, 50% 1.35 2.68 6.70 8.93 2.64
Latency, 75% 1.46 3.10 8.22 14.99 3.09
Latency, 90% 1.64 3.25 10.22 26.83 3.27
Latency, 99% 1.90 3.90 15.46 106.48 3.95
Throughput, m/s 68851 34769 14220 3402 35746
CPU, Http, % 29 20 70 7 33
Network, Http, p/s, K 71 71 37 8 71
CPU, Db, % - 6 25 1 6
Network, Http, p/s, K - 12 21 1 12
Unofficial results
cleverdata.ru | info@cleverdata.ru
Aerospike
cleverdata.ru | info@cleverdata.ru
Let’s start building…
cleverdata.ru | info@cleverdata.ru
• Do use non-blocking I/O;
• Do computations locally;
• Be lazy. Don’t do more than needed;
• Don’t afraid of using other’s good ideas;
• Don’t trust anyone.
Do’s & Don’ts
cleverdata.ru | info@cleverdata.ru
Thanks a lot for your questions!
info@cleverleaf.co.uk :: info@cleverdata.ru
cleverleaf.co.uk :: cleverdata.ru
1dmp.io/en :: crawler.1dmp.io/en
facebook.com/CleverData :: +7 (495) 967-66-50
To get started now with Aerospike, visit:
aerospike.com/get-started
Tweet us at @Aerospikedb

Más contenido relacionado

Destacado

Aerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike, Inc.
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...Aerospike, Inc.
 
Revolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at SwoopRevolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at SwoopSimeon Simeonov
 
Design and flow simulation of truncated aerospike nozzle
Design and flow simulation of truncated aerospike nozzleDesign and flow simulation of truncated aerospike nozzle
Design and flow simulation of truncated aerospike nozzleeSAT Journals
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to DeploymentAerospike, Inc.
 
Tectonic Shift: A New Foundation for Data Driven Business
Tectonic Shift: A New Foundation for Data Driven BusinessTectonic Shift: A New Foundation for Data Driven Business
Tectonic Shift: A New Foundation for Data Driven BusinessAerospike, Inc.
 
Leveraging Big Data with Hadoop, NoSQL and RDBMS
Leveraging Big Data with Hadoop, NoSQL and RDBMSLeveraging Big Data with Hadoop, NoSQL and RDBMS
Leveraging Big Data with Hadoop, NoSQL and RDBMSAerospike, Inc.
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike ArchitecturePeter Milne
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsAerospike, Inc.
 
Optimizing your job apply pages with the LinkedIn profile API
Optimizing your job apply pages with the LinkedIn profile APIOptimizing your job apply pages with the LinkedIn profile API
Optimizing your job apply pages with the LinkedIn profile APIIvo Brett
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike
 
Building your first app with mongo db
Building your first app with mongo dbBuilding your first app with mongo db
Building your first app with mongo dbMongoDB
 
Introduction to Redis - LA Hacker News
Introduction to Redis - LA Hacker NewsIntroduction to Redis - LA Hacker News
Introduction to Redis - LA Hacker NewsMichael Parker
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial ServicesAerospike
 
Building Your First Application with MongoDB
Building Your First Application with MongoDBBuilding Your First Application with MongoDB
Building Your First Application with MongoDBMongoDB
 
Agile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDBAgile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDBStennie Steneker
 

Destacado (17)

Aerospike: Key Value Data Access
Aerospike: Key Value Data AccessAerospike: Key Value Data Access
Aerospike: Key Value Data Access
 
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
 
Revolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at SwoopRevolutionazing Search Advertising with ElasticSearch at Swoop
Revolutionazing Search Advertising with ElasticSearch at Swoop
 
Design and flow simulation of truncated aerospike nozzle
Design and flow simulation of truncated aerospike nozzleDesign and flow simulation of truncated aerospike nozzle
Design and flow simulation of truncated aerospike nozzle
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
Tectonic Shift: A New Foundation for Data Driven Business
Tectonic Shift: A New Foundation for Data Driven BusinessTectonic Shift: A New Foundation for Data Driven Business
Tectonic Shift: A New Foundation for Data Driven Business
 
Leveraging Big Data with Hadoop, NoSQL and RDBMS
Leveraging Big Data with Hadoop, NoSQL and RDBMSLeveraging Big Data with Hadoop, NoSQL and RDBMS
Leveraging Big Data with Hadoop, NoSQL and RDBMS
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
 
Optimizing your job apply pages with the LinkedIn profile API
Optimizing your job apply pages with the LinkedIn profile APIOptimizing your job apply pages with the LinkedIn profile API
Optimizing your job apply pages with the LinkedIn profile API
 
Aerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower ManhattanAerospike AdTech Gets Hacked in Lower Manhattan
Aerospike AdTech Gets Hacked in Lower Manhattan
 
Building your first app with mongo db
Building your first app with mongo dbBuilding your first app with mongo db
Building your first app with mongo db
 
Introduction to Redis - LA Hacker News
Introduction to Redis - LA Hacker NewsIntroduction to Redis - LA Hacker News
Introduction to Redis - LA Hacker News
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
 
Introduction to mongoDB
Introduction to mongoDBIntroduction to mongoDB
Introduction to mongoDB
 
Building Your First Application with MongoDB
Building Your First Application with MongoDBBuilding Your First Application with MongoDB
Building Your First Application with MongoDB
 
Agile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDBAgile Schema Design: An introduction to MongoDB
Agile Schema Design: An introduction to MongoDB
 

Más de Aerospike, Inc.

01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinarAerospike, Inc.
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?Aerospike, Inc.
 
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use CasesWhat the Spark!? Intro and Use Cases
What the Spark!? Intro and Use CasesAerospike, Inc.
 
Get Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California HighwaysGet Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California HighwaysAerospike, Inc.
 
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/HourRunning a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/HourAerospike, Inc.
 
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACIDACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACIDAerospike, Inc.
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Aerospike, Inc.
 
Storm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsStorm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsAerospike, Inc.
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?Aerospike, Inc.
 
Distributing Data The Aerospike Way
Distributing Data The Aerospike WayDistributing Data The Aerospike Way
Distributing Data The Aerospike WayAerospike, Inc.
 
Getting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDsGetting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDsAerospike, Inc.
 
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Aerospike, Inc.
 
Configuring Aerospike - Part 1
Configuring Aerospike - Part 1Configuring Aerospike - Part 1
Configuring Aerospike - Part 1Aerospike, Inc.
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveAerospike, Inc.
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timeAerospike, Inc.
 

Más de Aerospike, Inc. (15)

01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use CasesWhat the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
 
Get Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California HighwaysGet Started with Data Science by Analyzing Traffic Data from California Highways
Get Started with Data Science by Analyzing Traffic Data from California Highways
 
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/HourRunning a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
 
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACIDACID & CAP:  Clearing CAP Confusion and Why C In CAP ≠ C in ACID
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
 
Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...Flash Economics and Lessons learned from operating low latency platforms at h...
Flash Economics and Lessons learned from operating low latency platforms at h...
 
Storm Persistence and Real-Time Analytics
Storm Persistence and Real-Time AnalyticsStorm Persistence and Real-Time Analytics
Storm Persistence and Real-Time Analytics
 
You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?You Snooze You Lose or How to Win in Ad Tech?
You Snooze You Lose or How to Win in Ad Tech?
 
Distributing Data The Aerospike Way
Distributing Data The Aerospike WayDistributing Data The Aerospike Way
Distributing Data The Aerospike Way
 
Getting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDsGetting The Most Out Of Your Flash/SSDs
Getting The Most Out Of Your Flash/SSDs
 
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
 
Configuring Aerospike - Part 1
Configuring Aerospike - Part 1Configuring Aerospike - Part 1
Configuring Aerospike - Part 1
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 
Predictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-timePredictable Big Data Performance in Real-time
Predictable Big Data Performance in Real-time
 

Último

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 

Último (20)

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Aerospike: Comparing NoSQL Databases for Real-Time Bidding

Notas del editor

  1. Just a few words about what CleverData is. It is a pretty young company which is joint of CleverLeaf, headquartered in London and Lanit – one of the biggest IT solutions company in Russia. CleverLeaf brough BigData, Data Analysis, Data Mining, and Data Processing development directions to Lanit. We are three years old since the founding of CleverLeaf, and CleverLeaf legal entity is free to contact to by anyone who prefers to deal with European legal entities instead of Russian ones. Moreover, we are the partners of Aerospike in Russia and EMEA regions, and provide professional services of setting up, tuning and developing solutions based on the Aerospike database as well as other technologies like hadoop stack, spark, elastic search, etc. Among our customers there are the biggest telecom, media and retail banking companies which are happy with the data management platforms we built and provided them with, with the Aerospike as the core of real-time data access layer. Recently we have launched 1DMP cloud platform to provide online ad market players and whoever else with the ability of data exchanging and data processing so that companies which own, for example, web logs, user profiles, tracking pixel logs or whatever else data, would be able to transform and monetize their data by means of our platform. The service is open and free to test right now with the links located at the end of the presentation.
  2. The agenda for today is the following… First of all there will be a brief introduction into what the RTB is… Then we’ll talk about the challenges RTB comes with… Service Level Agreements of every RTB participant should follow. Then I will try to describe the difficulties of choosing an appropriate software which will be a part of the final solution. Some of our testing results will be shown and we’ll also discuss the importance of testing all the components in the target environment and on the target hardware and software. And finally there will be some do’s and don’ts we have learned during the development of what we called “User Profile Enrichment Service” which uses 3rd party data to build a complete and full profile of the online user visiting web pages in order to allow the advertisers to buy impressions efficiently.
  3. Well, when anybody starts dealing with RTB, he will be definitely discouraged by the number of 3-letter abbreviations, RTB introduces. There are so many of them, that the most difficult thing is to remember what they mean. The most important abbreviations are: RTB, DSP, SSP, DMP. So let’s try to decipher these acronyms…
  4. RTB – is the real time bidding. When dealing with online ad there are a number of roles or parties. Some of them create content and try to make some money on it, for instance, by means of placing some advertisements within this content. Such a party is called publisher. There are also those, who would like to advertise some goods or services. And such participants are called advertisers. And, finally there are systems where the publishers and the advertisers are met, such as ad networks or ad exchanges and the last ones are often responsible for real-time auctions to sell the current impression of the given visitor.
  5. The real time bidding works in the following way: when anyone of us visits some web page, there is a special javascript code, invoked by this web page, that starts an auction. Advertisers make bids on the current visitor. The bid with the highest price wins and the corresponding advertiser notified to be a winner and his price lowers down to the second price needed to still stay a winner. Notice that interests of publishers and advertisers are opposite. Publishers are interested in making more money by selling less inventory (inventory in this case is a number of advertisements, or amount of ad space, a publisher has available to sell). Advertisers want to buy more inventory (that means more ad places, actions, impressions) spending less money. So let’s see how this conflict of interests is solved…
  6. There are special software systems which work on the side of advertisers. These systems are called “Demand Side Platforms” and are responsible for spending advertisers’ marketing budget in the most efficient way.
  7. On the opposite corner there are systems which are called “Supply Side Platforms”. These are working on the side of publishers and are responsible for selling the inventory for as high prices as possible.
  8. So, how the advertiser should know which impressions to buy and which do not to. Well, here is where the “Data Management Platforms” come into play. These platforms are responsible for gathering information about our interests by means of different ways and techniques, using data sources, like 3rd party data, partners’ data, web logs, click streams and so on and so forth.
  9. You may notice that nowadays the web pages are rendered pretty fast and the process of ad displaying should not harm the speed of rendering of the web page and should not harm users experience with this page too In order to understand the challenges developers of RTB systems are faced with, here is the slide that demonstrates performance requirements for these systems. There is only 100 milliseconds to complete the ad auction for our visit. Supply Side Platforms have something between 30-50 milliseconds to ask all the Demand Side Platforms for how much they bid for the current visitor. Demand Side Platform should decide whether it is interested in the given user or not. There must be some criteria to decide to bid on the current visitor regarding his interests, demands or wishes, or not to bid. As you already know such information is located in the Data Management Platform, which has only 10-20 milliseconds to find the current web page visitor profile. The load the RTB systems should be able to deal with are tenth and hundreds of thousands of requests per second.
  10. Well, what is necessary to do to make DMP so fast, so during the time of rendering of the web page it would be possible to obtain as complete user profile as possible? First of all, we should make DMP respond with the user profile with a pretty low latency. Then, it is necessary to make DMP respond in 10-20 milliseconds even if DMP is slow enough and is not capable to serve all the incoming requests. And finally, we have to obtain user profiles from everybody and every system who knows anything regarding the current user, for example, his interests, geolocation, gender, age and so on). It may be partner systems, 3rd party data, tracking data or any other data.
  11. Something like the desired behavior is displayed on the slide. If any of the requests to an external DMP is not served within the required 20 milliseconds – we have to just abort it, and stop processing it. How to achieve such kind of behavior? Well, probably, we have to setup near the DMP something fast enough and probably in-memory to make the DMP to respond with the user profiles within the required time. I suppose it may be one of the NoSQL databases, available on the market. Next,…, we have to implement an application, that queries this database for user profiles. And finally, it is necessary to make the mentioned application get user profiles from the external systems as fast as possible.
  12. The first issue, we faced with, is choosing NoSQL database. Choice criteria for the scalable and high available systems we decided to follow are: Linear scalability – because the amount of data tends to grow constantly, and user tracking, usually, is done with the help of cookies which may live quite a short amount of time. Out of the box sharding is required to achieve the previously mentioned liner scalability. The storage must be distributed by design. We don’t like to scale it manually. Data redundancy and replication are required to provide high availability. 5. And,…, the low latency, we discussed earlier is necessary to be able to respond within 10-20 milliseconds which we have to decide whether we are interested in the current user or not.
  13. When we started to choose the NoSQL database, we were a little bit puzzled, and were looking like the beast from the slide. There are a lot of solutions. Only nosql-databases.org lists about 150 of them. What to choose from all of them to be able to follow the SLAs? It will be hardly possible to test all of them in the near-production environments. So, in the first place we decided to pay attention to the functional requirements of the databases as well as to their users, community and customers, and then to give these databases a try to understand their performance.
  14. So, lets write out the desired functional requirements the NoSQL database should have, and look at what does fit our needs and what does not. All of the solutions are able to scale one way or another, except for Redis, that has already introduced Redis Cluster no so long ago. Unfortunately by the time of our testing Redis Cluster had not been declared yet. All of the solutions have support of sharding, except for Redis. We had to shard it manually or had to find any 3rd party proxy applications which do the sharding for us. However we decided to try Redis because, from our experience, it promised to be very fast. Regarding the replication and data redundancy – all of the solutions have it. Redis has master-slave replication. Regarding the single point of failure – Redis was given, plus-minus because to make the Master-Slave replication work as desired, it is necessary to install 3rd party software, like Sentinel, that had quite a lot of bugs during the testing and that is deprecated now, or we have to install a Load-Balancer. Anyway the client application will detect that master node disappeared, but we would like this process to be transparent. Maintainability is all about software installation, configuration, maintenance during its life cycle. Also it is the amount of efforts required to support this software. Cassandra got “plus-minus” because it is a little bit harder to install and maintain than the others Mongo got +- because it is difficult to install it in the correct way. … at least no so easy as some of the others. Also it requires more hardware to be set up, because it has a lot of processes of different roles like config services, replica nodes, routing services, etc. Redis got “-” because by the moment we had to scale it manually. And, regarding the monitoring tools – pluses got those solutions, which have graphical user interfaces. Honestly speaking, it is dubious advantage, although it is good to have while we start using some of the databases for the first time.
  15. Before doing any tests we decided to learn whether there are any official and publicly available results of performance tests of chosen NoSQL databases. So, we found the following paper with the graphs on the slide. Lets look at what we have. These are graphs which show the dependency between the latency and throughput. The only thing we can retrieve from these graphs is that the cassandra is mostly write-heavy database, mongodb – read-heavy database. If we look at the right part of the graph we may notice that for cassandra increased read throughput means increased latency. For mongo the write latency degrades if the throughput increases. It is necessary to say that we didn’t aim to measure the databases themselves, but we’d like to understand how the whole solution will work. Because of that we didn’t test the databases separately. If anyone wants to do it there is yahoo cloud system benchmark that already have drivers for the most popular NoSQL databases.
  16. Let’s decide what we are going to measure and how we will do it. Each of the tests run in the following way: we did 5 trials, for 3 minutes each. Then from all of the trials we got the trial with the best results. The size of every message was 512 bytes, like the average size of the typical user profile in our case. We generated 20 millions of records and loaded them into the databases. After the messages had been loaded we were waiting for all the database processes of data replication to be completed. And just after that we started testing. Forgot to mention why nginx is here. The developed Enrichment Service uses OpenRTB protocol to communicate with the outside world. We were measuring nginx, aerospike, redis, mongodb. The load was generated by means of wrk utility. To monitor the memory, cpu and network utilization we were using nmon and nmon-analyzer utilities. The hardware setup was the following: Intel Core i7, Quad Core, 48 gigs of RAM, 1 gigabit NICs. We also have to mention that it always necessary to read the small print. In our case the hosting provider guaranteed the network speed to be only 200 megabits per second, although all the boxes have 1 Gbit NICs.
  17. First of all lets measure nginx to look at what it is able to show in our environment. Here is how the components were deployed. The first box has wrk installed, another box has nginx without any plugins or logging enabled. Wrk sends requests to nginx as fast as possible. The results of this test will be our base line to compare other results with.
  18. Here are the numbers shown by nginx. The percentiles are the following: 50th percentile of latency is 1.35 ms 75th percentile of latency is 1.46 ms 99th percentile of latency is 1.90 ms The maximum throughput achieved was about 69000 of requests per second. The CPU utilization of the nginx host was near 30% CPU utilization of the WRK-hosts during the tests was no more than 35%
  19. Next we tested Redis. It was deployed in the configuration displayed on the slide. On every server there were 2 master and 2 slave processes, which were bound to the corresponding core of the quad-core CPUs. Slave nodes of the corresponding masters lived on the different boxes. The records were sharded by the key. Nginx used 6 upstream configurations to route the requests to an appropriate instances of Redis.
  20. Here are the Redis results. The number of handled requests decreased because additional network hop appeared between the nginx and redis.
  21. The network interface on the box that hosted nginx seems to be a bottleneck. The number of network packets it was able to handle was near 70-75 thousands per second.
  22. Mongodb was deployed in the following way: each of our servers hosted one config service, and three services represented a replica set, and joined into the corresponding shards. Generally speaking the correct installation of mongo cluster is pretty unfriendly - we have to manage a lot of mongod processes, which have to be maintained, monitored, and so on and so forth. Mongos process was installed on the node which hosted nginx. Communication between nginx and mongos was established by means of the local network interface. We decided to use the local interface because we noticed that the main interface has some limitations regarding the number of network packets it is able to handle. We used lua-resty-mongol nginx plugin to connect nginx to mongo.
  23. Here are the results. They are not so impressive as we expected them to be. We suppose that such results occurred because we were using lua, also the communication with mongo was thought the master node of the replica set. We also tried to use slaveOk option, but it didn’t lead us to any improvement in performance. It should be noted that the CPU was mostly used by mongos daemon, but not nginx. The CPU was utilized by mongos by about 45%, and by nginx by about 25%.
  24. We decided to show the local network interface here, because all the traffic between nginx and mongos was coming through it.
  25. Lets start testing aerospike. We developed a custom nginx plugin to connect nginx to aerospike. Aerospike was deployed on the three nodes, with the replication factor equal to two.
  26. And when we started testing, we were very upset by the performance shown. We were surprised by the results and decided to puzzle out the source of this issue.
  27. There was nothing special with the network (with such a throughput).
  28. We started doing more tests. During the every test we increased the number of nginx worker processes. Although throughput was increasing, there was no any affect on the 99th percentile of latency. In general the latency results were not quite stable while increasing the number of nginx workers.
  29. So, we realized, that we were trying to make nginx, which uses non-blocking I/O, work perfectly with the blocking aerospike client library. This means that our blocking source code that was based on blocking aeropiske c client, killed the nginx performance completely.
  30. Ok, there are some good news. Aerospike has a lot of client libraries for all the occasions. Among all the other client libraries there is libevent-based non-blocking client. We implemented a quite simple http-server based on libevent and libevhtp, that is a lib to implement multithreaded http servers based on libevent
  31. The final results were not long in coming and the response time returned back to expected.
  32. The network is the bottleneck once again.
  33. And here are the results we’ve got, taking into account all the constraints and limitations we had. Nginx column is the baseline we oriented to. The performance of the non-blocking aerospike client is comparable to the performance of redis. Mongo performance is quite good, but we suppose that we did not learn to cook it to achieve the results similar to redis and aerospike. It is also not so easy in maintening comparing to other databases. It is worth noting that we were not able to get more that 70-75K of network packets per second with our hosting provider, so we were theoretically limited by the 70-75K of 512-bytes messages. Anyway the results shown by non-blocking aerospike client and redis were pretty good.
  34. What else can we add regarding Aerospike? Probably, the main thing is that aerospike was open sourced not so long ago. It uses SSD disks as raw devises by means of linux direct I/O API to achieve its incredible performance while keeping costs of platform maintenance and ownership quite low. Its client libraries know about the cluster topology, so only one network hop is necessary to obtain the required data. And last, but not least is that the aerospike supports secondary indexes which work really fast comparing to other databases’ secondary indexes which are maintained by means of slow map-reduce procedures.
  35. So, what components our enrichment service is built of? The application is pretty simple. Based on libevent and libevhtp, it consists of the core modules to bootstrap, initialize and configure the main internal data structures, service modules to periodically provide the application with the external data (like prices), handler modules to access externally and internally provided “small” data management platforms and to aggregate the corresponding replies and send it back to the client. The ultimate performance and hardware utilization was achieved by choosing as you may remember the asynchronous nature of the application rather than traditional “thread-per-connection” model. All the external calls were running in parallel while the application was using one thread per CPU core to handle the incoming http requests and one additional thread for the internal purposes of the application. Request-handling threads were absolutely independent from each other, so there was no need to communicate between them, and the application (with the high-performance in mind) was running not only asynchronously, but completely lock-free. To be even more efficient we used the memory pools, provided by the excellent talloc library. It increases the performance of the memory management functions by allocating memory chunks of an appropriate size once and then reusing them, instead of worrying about memory deallocations and memory leaks, which are quite common issues in C applications. We also used zlog library for logging protobuf – as wire protocol between internal services and our enrichment service Glib – for data structures and configuration of the application json-glib extension – to parse OpenRTB json requests graphite – to monitor and visualize the performance of the application. To prevent the usage of thread synchronization primitives we made master and worker threads communicate by means of socketpair thought the unix domain sockets.
  36. And here is what we learned during the development of our service. Do not use blocking I/O if possible. If you have to – just isolate this functionality into some external service and use non-blocking I/O to communicate with it. Everything that can be done locally without network communications should be done locally. When we were implementing our graphite module for the first time, we sent to graphite every metric we obtained. Imaging now that we have 10000 requests per second, and every request involves gathering from 15 to 20 metrics. That will be about from 150.000 to 200.000 of requests per second. So we started aggregating metrics locally and flushed them periodically to the graphite. Always be lazy. We can progress only in this way . Don’t do more than needed. Almost 40% of the source code is never used. Use other’s good ideas. While implementing the enrichment service we used the module and plugin system very similar to ones of nginx. And never ever trust anyone. If you have to benchmark something, use your environment, requirements, software and technologies which you plan to be in the final solution.
  37. So, that is all. Will be glad to answer all your questions.