SlideShare una empresa de Scribd logo
1 de 74
Descargar para leer sin conexión
Tweaking performance
on high-load projects
Dmitriy Dumanskiy
Cogniance, mGage project
Java Team Lead
Project evolution
Mgage
Mobclix
XXXX
Mgage delivery load
3 billions req/mon.
~8 c3.xLarge Amazon instances.
Average load : 2400 req/sec
Peak : x10
Mobclix delivery load
14 billions req/mon.
~16 c3.xLarge Amazon
instances.
Average load : 6000 req/sec
Peak : x6
XXXX delivery Load
20 billions req/mon.
~14 c3.xLarge Amazon instances.
Average load : 11000 req/sec
Peak : x6
Is it a lot?
Average load : 11000 req/sec
Twitter : new tweets
15 billions a month
Average load : 5700 req/sec
Peak : x30
Delivery load
Requests per
month
Max load
per
instance,
req/sec
Requirements
Servers,
AWS c3.
xLarge
Mgage 3 billions 300
HTTP
Time 95% < 60ms
8
Mobclix 14 billions 400
HTTP
Time 95% < 100ms
16
XXXX 20 billions 800
HTTPS
Time 99% < 100ms
14
Delivery load
c3.XLarge - 4 vCPU, 2.8 GHz Intel Xeon E5-2680
LA - ~2-3
1-2 cores reserved for sudden peaks
BE tech stacks
Mobclix :
Spring, iBatis, MySql, Solr, Vertica, Cascading, Tomcat
Mgage :
Spring, Hibernate, Postgres, Distributed ehCache, Hadoop, Voldemort, Jboss
XXXX:
Spring, Hibernate, MySQL, Solr, Cascading, Redis, Tomcat
Initial problem
● ~1000 req/sec
● Peaks 6x
● 99% HTTPS with response time < 100ms
Real problem
● ~85 mln active users, ~115 mln registered users
● 11.5 messages per user per day
● ~11000 req/sec
● Peaks 6x
● 99% HTTPS with response time < 100ms
● Reliable and scalable for future grow up to 80k
Architecture
AdServer Console (UI)
Reporting
Architecture
Console (UI)
MySql
SOLR Master
SOLR Slave SOLR SlaveSOLR Slave
SOLR? Why?
● Pros:
○ Quick search on complex queries
○ Has a lot of build-in features (master-
slave replication, RDBMS integration)
● Cons:
○ Only HTTP, embedded performs worth
○ Not easy for beginners
○ Max load is ~100 req/sec per instance
“Simple” query
"-(-connectionTypes:"+"""+getConnectionType()+"""+" AND connectionTypes:[* TO
*]) AND "+"-connectionTypeExcludes:"+"""+getConnectionType()+"""+" AND " + "-(-
OSes:"+"(""+osQuery+"" OR ""+getOS()+"")"+" AND OSes:[* TO *]) AND " + "-
osExcludes:"+"(""+osQuery+"" OR ""+getOS()+"")" "AND (runOfNetwork:T OR
appIncludes:"+getAppId()+" OR pubIncludes:"+getPubId()+" OR categories:
("+categoryList+"))" +" AND -appExcludes:"+getAppId()+" AND -pubExcludes:"
+getPubId()+" AND -categoryExcludes:("+categoryList+") AND " + keywordQuery+" AND
" + "-(-devices:"+"""+getHandsetNormalized()+"""+" AND devices:[* TO *]) AND " +
"-deviceExcludes:"+"""+getHandsetNormalized()+"""+" AND " + "-(-carriers:"+"""
+getCarrier()+"""+" AND carriers:[* TO *]) AND " + "-carrierExcludes:"+"""
+getCarrier()+"""+" AND " + "-(-locales:"+"(""+locale+"" OR ""+langOnly+"")"
+" AND locales:[* TO *]) AND " + "-localeExcludes:"+"(""+locale+"" OR ""
+langOnly+"") AND " + "-(-segments:("+segmentQuery+") AND segments:[* TO *]) AND
" + "-segmentExcludes:("+segmentQuery+")" + " AND -(-geos:"+geoQuery+" AND geos:[*
TO *]) AND " + "-geosExcludes:"+geoQuery
Architecture
MySql
Solr Master
SOLR Slave
AdServer
SOLR Slave
AdServer
SOLR Slave
AdServer
No-SQL
AdServer - Solr Slave
Delivery:
volitile DeliveryData cache;
Cron Job:
DeliveryData tempCache = loadData();
cache = tempCache;
Why no-sql?
● Realtime data
● Quick response time
● Simple queries by key
● 1-2 queries to no-sql on every request. Average load
10-20k req/sec and >120k req/sec in peaks.
● Cheap solution
Why Redis? Pros
● Easy and light-weight
● Low latency and response time.
99% is < 1ms. Average latency is ~0.2ms
● Up to 100k 'get' commands per second on
c1.X-Large
● Cool features (atomic increments, sets,
hashes)
● Ready AWS service — ElastiCache
Why Redis? Cons
● Single-threaded from the box
● Utilize all cores - sharding/clustering
● Scaling/failover not easy
● Limited up to max instance memory (240GB largest
AWS)
● Persistence/swapping may delay response
● Cluster solution not production ready
DynamoDB vs Redis
Price per month Put, 95% Get, 95% Rec/sec
DynamoDB 58$ 300ms 150ms 50
DynamoDB 580$ 60ms 8ms 780
DynamoDB 5800$ 16ms 8ms 1250
Redis 200$ (c1.medium) 3ms <1ms 4000
ElastiCache 600$ (c1.xlarge) <1ms <1ms 10000
What about others?
● Cassandra
● Voldemort
● Memcached
Redis RAM problem
● 1 user entry ~ from 80 bytes to 3kb
● ~85 mln users
● Required RAM ~ from 1 GB to 300 GB
Data compression speed
Data compression size
Data compression
Json → Kryo binary → 4x times less data →
Gzipping → 2x times less data == 8x less data
Now we need < 40 GB
+ Less load on network stack
AdServer BE
Average response time — ~1.2 ms
Load — 800 req/sec with LA ~4
c3.XLarge == 4 vCPU
AdServer BE
● Logging — 12% of time (5% on SSD);
● Response generation — 15% of time;
● Redis request — 50% of time;
● All business logic — 23% of time;
Reporting
AdServer Hadoop ETL
MySQLConsole
S3 S3
Delivery logs Aggregated logs
Log structure
{ "uid":"test",
"platform":"android",
"app":"xxx",
"ts":1375952275223,
"pid":1,
"education":"Some-Highschool-or-less",
"type":"new",
"sh":1280,
"appver":"6.4.34",
"country":"AU",
"time":"Sat, 03 August 2013 10:30:39 +0200",
"deviceGroup":7,
"rid":"fc389d966438478e9554ed15d27713f51",
"responseCode":200,
"event":"ad",
"device":"N95",
"sw":768,
"ageGroup":"18-24",
"preferences":["beer","girls"] }
Log structure
● 1 mln. records == 0.6 GB.
● ~900 mln records a day == ~0.55 TB.
● 1 month up to 20 TB of data.
● Zipped data is 10 times less.
Reporting
Customer : “And we need fancy reporting”
But 20 TB of data per month is huge. So what
we can do?
Reporting
Dimensions:
device, os, osVer, sreenWidth, screenHeight,
country, region, city, carrier, advertisingId,
preferences, gender, age, income, sector,
company, language, etc...
Use case:
I want to know how many users saw my ad in San-
Francisco.
Reporting
Geo table:
Country, City, Region, CampaignId, Date, counters;
Device table:
Device, Carrier, Platform, CampaignId, Date, counters;
Uniques table:
CampaignId, UID
Predefined report types → aggregation by
predefined dimensions → 500-1000 times less
data
20 TB per month → 40 GB per month
Of course - hadoop
● Pros:
○ Unlimited (depends) horizontal scaling
● Cons:
○ Not real-time
○ Processing time directly depends on quality code
and on infrastructure cost.
○ Not all input can be scaled
○ Cluster startup is so... long
Alternatives?
● Storm
● Redshift
● Vertica
● Math models?
Elastic MapReduce
● Easy setup
● Easy extend
● Easy to monitor
Timing
● Hadoop (cascading) :
○ 25 GB in peak hour takes ~40min (-10 min). CSV
output 300MB. With cluster of 4 c3.xLarge.
● MySQL:
○ Put 300MB in DB with insert statements ~40 min.
Timing
● Hadoop (cascading) :
○ 25 GB in peak hour takes ~40min (-10 min). CSV
output 300MB. With cluster of 4 c3.xLarge.
● MySQL:
○ Put 300MB in DB with insert statements ~40 min.
● MySQL:
○ Put 300MB in DB with optimizations ~5 min.
Optimized are
● No “insert into”. Only “load data” - ~10 times faster
● “ENGINE=MyISAM“ vs “INNODB” when possible - ~5
times faster
● For “upsert” - temp table with “ENGINE=MEMORY” - IO
savings
Cascading
Hadoop:
void map(K key, V val,
OutputCollector collector) {
...
}
void reduce(K key, Iterator<V> vals,
OutputCollector collector) {
...
}
Cascading:
Scheme sinkScheme = new TextLine(new Fields(
"word", "count"));
Pipe assembly = new Pipe("wordcount");
assembly = new Each(assembly, new Fields( "line"
), new RegexGenerator(new Fields("word"), ",") );
assembly = new GroupBy(assembly, new Fields(
"word"));
Aggregator count = new Count(new Fields(
"count"));
assembly = new Every(assembly, count);
Why cascading?
Hadoop Job 1
Hadoop Job 2
Hadoop Job 3
Result of one job should be processed by another job
Lessons Learned
Redis sharding
Redis shard 0 Redis shard 1 Redis shard 2
AdServer
shardNumber = UID.hashCode() / 3
Resharding problem
All data already in shards, how to add new shards?
Resharding problem. Solution
Old Shard NewShard
1. Get NEW UID. If not
present - a).
AdServer
a) Get OLD UID 2. Save UID to new
Shard
Removal script
Postgres partitioning
● Queries on small partitions
● Distributed index
● Less index size
● Small partitions may fit RAM memory
● Easy to remove/move
Cost of IO
L1 cache 3 cycles
L2 cache 14 cycles
RAM 250 cycles
Disk 41 000 000 cycles
Network 240 000 000 cycles
Cost of IO
@Cacheable is everywhere
Hadoop
Map input : 300 MB
Map output : 80 GB
Hadoop
● mapreduce.map.output.compress = true
● codecs: GZip, BZ2 - CPU intensive
● codecs: LZO, Snappy
● codecs: JNI
~x10
Hadoop
Consider Combiner
Hadoop
Text, IntWritable, BytesWritable, NullWritable,
etc
Simpler - better
Hadoop
Missing data:
map(T value, ...) {
Log log = parse(value);
Data data = dbWrapper.getSomeMissingData(log.getCampId());
}
Hadoop
Missing data:
map(T value, ...) {
Log log = parse(value);
Data data = dbWrapper.getSomeMissingData(log.getCampId());
}
Wrong
Hadoop
Unnecessary data:
map(T value, ...) {
Log log = parse(value);
Key resultKey = makeKey(log.getCampName(), ...);
output.collect(resultKey, resultValue);
}
Hadoop
Unnecessary data:
map(T value, ...) {
Log log = parse(value);
Key resultKey = makeKey(log.getCampName(), ...);
output.collect(resultKey, resultValue);
}
Wrong
Hadoop
Unnecessary data:
RecordWriter.write(K key, V value) {
Entity entity = makeEntity(key, value);
dbWrapper.save(entity);
}
Hadoop
Unnecessary data:
RecordWriter.write(K key, V value) {
Entity entity = makeEntity(key, value);
dbWrapper.save(entity);
}
Wrong
Hadoop
public boolean equals(Object obj) {
EqualsBuilder equalsBuilder = new EqualsBuilder();
equalsBuilder.append(id, otherKey.getId());
...
}
public int hashCode() {
HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();
hashCodeBuilder.append(id);
...
}
Hadoop
public boolean equals(Object obj) {
EqualsBuilder equalsBuilder = new EqualsBuilder();
equalsBuilder.append(id, otherKey.getId());
...
}
public int hashCode() {
HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();
hashCodeBuilder.append(id);
...
}
Wrong
Hadoop
public void map(...) {
…
for (String word : words) {
output.collect(new Text(word), new IntVal(1));
}
}
Hadoop
public void map(...) {
…
for (String word : words) {
output.collect(new Text(word), new IntVal(1));
}
}
Wrong
Hadoop
class MyMapper extends Mapper {
Text word = new Text();
IntVal one = new IntVal(1);
public void map(...) {
for (String word : words) {
word.set(word);
output.collect(word, one);
}
}
}
Network
Per 1 AdServer instance :
Income traffic : ~100Mb/sec
Outcome traffic : ~50Mb/sec
LB all traffic :
Almost 10 Gb/sec
Amazon
AWS ElastiCache
SLOWLOG GET
1) 1) (integer) 35
2) (integer) 1391709950
3) (integer) 34155
4) 1) "GET"
2) "2ads10percent_rmywqesssitmfksetzvj"
2) 1) (integer) 34
2) (integer) 1391709830
3) (integer) 34863
4) 1) "GET"
2) "2ads10percent_tteeoomiimcgdzcocuqs"
AWS ElastiCache
35ms for GET? WTF?
Even java faster
AWS ElastiCache
● Strange timeouts (with SO_TIMEOUT 50ms)
● No replication for another cluster
● «Cluster» is not a cluster
● Cluster uses usual instances, so pay for 4
cores while using 1
AWS Limits. You never know where
● Network limit
● PPS rate limit
● LB limit
● Cluster start time up to 20 mins
● Scalability limits
● S3 is slow for many files
Facts
● HTTP x2 faster HTTPS
● HTTPS keep-alive +80% performance
● Java 7 40% faster Java 6 (our case)
● All IO operations minimized

Más contenido relacionado

La actualidad más candente

Lightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingLightning Talk: MongoDB Sharding
Lightning Talk: MongoDB Sharding
MongoDB
 

La actualidad más candente (20)

ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
MongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of EventsMongoDB - Warehouse and Aggregator of Events
MongoDB - Warehouse and Aggregator of Events
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary Tale
 
Managing your Black Friday Logs
Managing your Black Friday LogsManaging your Black Friday Logs
Managing your Black Friday Logs
 
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & ConsistencyNoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
NoSQL and NewSQL: Tradeoffs between Scalable Performance & Consistency
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and Merging
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS Instances
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
Bucket Your Partitions Wisely (Markus Höfer, codecentric AG) | Cassandra Summ...
 
Lightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingLightning Talk: MongoDB Sharding
Lightning Talk: MongoDB Sharding
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 

Destacado

AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
GeeksLab Odessa
 

Destacado (10)

Стартапы в AI&BigData_Виталий Гончарук
Стартапы в AI&BigData_Виталий ГончарукСтартапы в AI&BigData_Виталий Гончарук
Стартапы в AI&BigData_Виталий Гончарук
 
Моделирование структурными уравнениями_Алексей Гаевский
Моделирование структурными уравнениями_Алексей ГаевскийМоделирование структурными уравнениями_Алексей Гаевский
Моделирование структурными уравнениями_Алексей Гаевский
 
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий..."Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
"Data mining и информационный поиск проблемы, алгоритмы, решения"_Краковецкий...
 
"AI&Big Data для путешественников"_Кузнецов Юрий
"AI&Big Data для путешественников"_Кузнецов Юрий "AI&Big Data для путешественников"_Кузнецов Юрий
"AI&Big Data для путешественников"_Кузнецов Юрий
 
Всеволод Демкин "Natural language processing на практике"
Всеволод Демкин "Natural language processing на практике"Всеволод Демкин "Natural language processing на практике"
Всеволод Демкин "Natural language processing на практике"
 
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
Тимашев Дмитрий "Что такое визуализация данных, или почему специалисты, работ...
 
Deep learning: Cложный анализ данных простыми словами_Сергей Шелпук
Deep learning: Cложный анализ данных простыми словами_Сергей ШелпукDeep learning: Cложный анализ данных простыми словами_Сергей Шелпук
Deep learning: Cложный анализ данных простыми словами_Сергей Шелпук
 
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
AI&BigData Lab 2016. Максим Терещенко: #DataForGood - как изменить мир к лучш...
 
Презентация Ukraine Global Scholars
Презентация Ukraine Global Scholars Презентация Ukraine Global Scholars
Презентация Ukraine Global Scholars
 
освіта калуш New.pptx
освіта калуш New.pptxосвіта калуш New.pptx
освіта калуш New.pptx
 

Similar a Tweaking perfomance on high-load projects_Думанский Дмитрий

Handling 20 billion requests a month
Handling 20 billion requests a monthHandling 20 billion requests a month
Handling 20 billion requests a month
Dmitriy Dumanskiy
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB
 

Similar a Tweaking perfomance on high-load projects_Думанский Дмитрий (20)

Handling 20 billion requests a month
Handling 20 billion requests a monthHandling 20 billion requests a month
Handling 20 billion requests a month
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbag
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 

Más de GeeksLab Odessa

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
GeeksLab Odessa
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
GeeksLab Odessa
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
GeeksLab Odessa
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
GeeksLab Odessa
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
GeeksLab Odessa
 

Más de GeeksLab Odessa (20)

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
 

Último

一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Último (20)

一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 

Tweaking perfomance on high-load projects_Думанский Дмитрий

  • 2. Dmitriy Dumanskiy Cogniance, mGage project Java Team Lead
  • 4. Mgage delivery load 3 billions req/mon. ~8 c3.xLarge Amazon instances. Average load : 2400 req/sec Peak : x10
  • 5. Mobclix delivery load 14 billions req/mon. ~16 c3.xLarge Amazon instances. Average load : 6000 req/sec Peak : x6
  • 6. XXXX delivery Load 20 billions req/mon. ~14 c3.xLarge Amazon instances. Average load : 11000 req/sec Peak : x6
  • 7. Is it a lot? Average load : 11000 req/sec
  • 8. Twitter : new tweets 15 billions a month Average load : 5700 req/sec Peak : x30
  • 9. Delivery load Requests per month Max load per instance, req/sec Requirements Servers, AWS c3. xLarge Mgage 3 billions 300 HTTP Time 95% < 60ms 8 Mobclix 14 billions 400 HTTP Time 95% < 100ms 16 XXXX 20 billions 800 HTTPS Time 99% < 100ms 14
  • 10. Delivery load c3.XLarge - 4 vCPU, 2.8 GHz Intel Xeon E5-2680 LA - ~2-3 1-2 cores reserved for sudden peaks
  • 11. BE tech stacks Mobclix : Spring, iBatis, MySql, Solr, Vertica, Cascading, Tomcat Mgage : Spring, Hibernate, Postgres, Distributed ehCache, Hadoop, Voldemort, Jboss XXXX: Spring, Hibernate, MySQL, Solr, Cascading, Redis, Tomcat
  • 12. Initial problem ● ~1000 req/sec ● Peaks 6x ● 99% HTTPS with response time < 100ms
  • 13. Real problem ● ~85 mln active users, ~115 mln registered users ● 11.5 messages per user per day ● ~11000 req/sec ● Peaks 6x ● 99% HTTPS with response time < 100ms ● Reliable and scalable for future grow up to 80k
  • 16. SOLR? Why? ● Pros: ○ Quick search on complex queries ○ Has a lot of build-in features (master- slave replication, RDBMS integration) ● Cons: ○ Only HTTP, embedded performs worth ○ Not easy for beginners ○ Max load is ~100 req/sec per instance
  • 17. “Simple” query "-(-connectionTypes:"+"""+getConnectionType()+"""+" AND connectionTypes:[* TO *]) AND "+"-connectionTypeExcludes:"+"""+getConnectionType()+"""+" AND " + "-(- OSes:"+"(""+osQuery+"" OR ""+getOS()+"")"+" AND OSes:[* TO *]) AND " + "- osExcludes:"+"(""+osQuery+"" OR ""+getOS()+"")" "AND (runOfNetwork:T OR appIncludes:"+getAppId()+" OR pubIncludes:"+getPubId()+" OR categories: ("+categoryList+"))" +" AND -appExcludes:"+getAppId()+" AND -pubExcludes:" +getPubId()+" AND -categoryExcludes:("+categoryList+") AND " + keywordQuery+" AND " + "-(-devices:"+"""+getHandsetNormalized()+"""+" AND devices:[* TO *]) AND " + "-deviceExcludes:"+"""+getHandsetNormalized()+"""+" AND " + "-(-carriers:"+""" +getCarrier()+"""+" AND carriers:[* TO *]) AND " + "-carrierExcludes:"+""" +getCarrier()+"""+" AND " + "-(-locales:"+"(""+locale+"" OR ""+langOnly+"")" +" AND locales:[* TO *]) AND " + "-localeExcludes:"+"(""+locale+"" OR "" +langOnly+"") AND " + "-(-segments:("+segmentQuery+") AND segments:[* TO *]) AND " + "-segmentExcludes:("+segmentQuery+")" + " AND -(-geos:"+geoQuery+" AND geos:[* TO *]) AND " + "-geosExcludes:"+geoQuery
  • 18. Architecture MySql Solr Master SOLR Slave AdServer SOLR Slave AdServer SOLR Slave AdServer No-SQL
  • 19. AdServer - Solr Slave Delivery: volitile DeliveryData cache; Cron Job: DeliveryData tempCache = loadData(); cache = tempCache;
  • 20. Why no-sql? ● Realtime data ● Quick response time ● Simple queries by key ● 1-2 queries to no-sql on every request. Average load 10-20k req/sec and >120k req/sec in peaks. ● Cheap solution
  • 21. Why Redis? Pros ● Easy and light-weight ● Low latency and response time. 99% is < 1ms. Average latency is ~0.2ms ● Up to 100k 'get' commands per second on c1.X-Large ● Cool features (atomic increments, sets, hashes) ● Ready AWS service — ElastiCache
  • 22. Why Redis? Cons ● Single-threaded from the box ● Utilize all cores - sharding/clustering ● Scaling/failover not easy ● Limited up to max instance memory (240GB largest AWS) ● Persistence/swapping may delay response ● Cluster solution not production ready
  • 23. DynamoDB vs Redis Price per month Put, 95% Get, 95% Rec/sec DynamoDB 58$ 300ms 150ms 50 DynamoDB 580$ 60ms 8ms 780 DynamoDB 5800$ 16ms 8ms 1250 Redis 200$ (c1.medium) 3ms <1ms 4000 ElastiCache 600$ (c1.xlarge) <1ms <1ms 10000
  • 24. What about others? ● Cassandra ● Voldemort ● Memcached
  • 25. Redis RAM problem ● 1 user entry ~ from 80 bytes to 3kb ● ~85 mln users ● Required RAM ~ from 1 GB to 300 GB
  • 28. Data compression Json → Kryo binary → 4x times less data → Gzipping → 2x times less data == 8x less data Now we need < 40 GB + Less load on network stack
  • 29. AdServer BE Average response time — ~1.2 ms Load — 800 req/sec with LA ~4 c3.XLarge == 4 vCPU
  • 30. AdServer BE ● Logging — 12% of time (5% on SSD); ● Response generation — 15% of time; ● Redis request — 50% of time; ● All business logic — 23% of time;
  • 31. Reporting AdServer Hadoop ETL MySQLConsole S3 S3 Delivery logs Aggregated logs
  • 32. Log structure { "uid":"test", "platform":"android", "app":"xxx", "ts":1375952275223, "pid":1, "education":"Some-Highschool-or-less", "type":"new", "sh":1280, "appver":"6.4.34", "country":"AU", "time":"Sat, 03 August 2013 10:30:39 +0200", "deviceGroup":7, "rid":"fc389d966438478e9554ed15d27713f51", "responseCode":200, "event":"ad", "device":"N95", "sw":768, "ageGroup":"18-24", "preferences":["beer","girls"] }
  • 33. Log structure ● 1 mln. records == 0.6 GB. ● ~900 mln records a day == ~0.55 TB. ● 1 month up to 20 TB of data. ● Zipped data is 10 times less.
  • 34. Reporting Customer : “And we need fancy reporting” But 20 TB of data per month is huge. So what we can do?
  • 35. Reporting Dimensions: device, os, osVer, sreenWidth, screenHeight, country, region, city, carrier, advertisingId, preferences, gender, age, income, sector, company, language, etc... Use case: I want to know how many users saw my ad in San- Francisco.
  • 36. Reporting Geo table: Country, City, Region, CampaignId, Date, counters; Device table: Device, Carrier, Platform, CampaignId, Date, counters; Uniques table: CampaignId, UID
  • 37. Predefined report types → aggregation by predefined dimensions → 500-1000 times less data 20 TB per month → 40 GB per month
  • 38. Of course - hadoop ● Pros: ○ Unlimited (depends) horizontal scaling ● Cons: ○ Not real-time ○ Processing time directly depends on quality code and on infrastructure cost. ○ Not all input can be scaled ○ Cluster startup is so... long
  • 39. Alternatives? ● Storm ● Redshift ● Vertica ● Math models?
  • 40. Elastic MapReduce ● Easy setup ● Easy extend ● Easy to monitor
  • 41. Timing ● Hadoop (cascading) : ○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge. ● MySQL: ○ Put 300MB in DB with insert statements ~40 min.
  • 42. Timing ● Hadoop (cascading) : ○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge. ● MySQL: ○ Put 300MB in DB with insert statements ~40 min. ● MySQL: ○ Put 300MB in DB with optimizations ~5 min.
  • 43. Optimized are ● No “insert into”. Only “load data” - ~10 times faster ● “ENGINE=MyISAM“ vs “INNODB” when possible - ~5 times faster ● For “upsert” - temp table with “ENGINE=MEMORY” - IO savings
  • 44. Cascading Hadoop: void map(K key, V val, OutputCollector collector) { ... } void reduce(K key, Iterator<V> vals, OutputCollector collector) { ... } Cascading: Scheme sinkScheme = new TextLine(new Fields( "word", "count")); Pipe assembly = new Pipe("wordcount"); assembly = new Each(assembly, new Fields( "line" ), new RegexGenerator(new Fields("word"), ",") ); assembly = new GroupBy(assembly, new Fields( "word")); Aggregator count = new Count(new Fields( "count")); assembly = new Every(assembly, count);
  • 45. Why cascading? Hadoop Job 1 Hadoop Job 2 Hadoop Job 3 Result of one job should be processed by another job
  • 47. Redis sharding Redis shard 0 Redis shard 1 Redis shard 2 AdServer shardNumber = UID.hashCode() / 3
  • 48. Resharding problem All data already in shards, how to add new shards?
  • 49. Resharding problem. Solution Old Shard NewShard 1. Get NEW UID. If not present - a). AdServer a) Get OLD UID 2. Save UID to new Shard Removal script
  • 50. Postgres partitioning ● Queries on small partitions ● Distributed index ● Less index size ● Small partitions may fit RAM memory ● Easy to remove/move
  • 51. Cost of IO L1 cache 3 cycles L2 cache 14 cycles RAM 250 cycles Disk 41 000 000 cycles Network 240 000 000 cycles
  • 52. Cost of IO @Cacheable is everywhere
  • 53. Hadoop Map input : 300 MB Map output : 80 GB
  • 54. Hadoop ● mapreduce.map.output.compress = true ● codecs: GZip, BZ2 - CPU intensive ● codecs: LZO, Snappy ● codecs: JNI ~x10
  • 56. Hadoop Text, IntWritable, BytesWritable, NullWritable, etc Simpler - better
  • 57. Hadoop Missing data: map(T value, ...) { Log log = parse(value); Data data = dbWrapper.getSomeMissingData(log.getCampId()); }
  • 58. Hadoop Missing data: map(T value, ...) { Log log = parse(value); Data data = dbWrapper.getSomeMissingData(log.getCampId()); } Wrong
  • 59. Hadoop Unnecessary data: map(T value, ...) { Log log = parse(value); Key resultKey = makeKey(log.getCampName(), ...); output.collect(resultKey, resultValue); }
  • 60. Hadoop Unnecessary data: map(T value, ...) { Log log = parse(value); Key resultKey = makeKey(log.getCampName(), ...); output.collect(resultKey, resultValue); } Wrong
  • 61. Hadoop Unnecessary data: RecordWriter.write(K key, V value) { Entity entity = makeEntity(key, value); dbWrapper.save(entity); }
  • 62. Hadoop Unnecessary data: RecordWriter.write(K key, V value) { Entity entity = makeEntity(key, value); dbWrapper.save(entity); } Wrong
  • 63. Hadoop public boolean equals(Object obj) { EqualsBuilder equalsBuilder = new EqualsBuilder(); equalsBuilder.append(id, otherKey.getId()); ... } public int hashCode() { HashCodeBuilder hashCodeBuilder = new HashCodeBuilder(); hashCodeBuilder.append(id); ... }
  • 64. Hadoop public boolean equals(Object obj) { EqualsBuilder equalsBuilder = new EqualsBuilder(); equalsBuilder.append(id, otherKey.getId()); ... } public int hashCode() { HashCodeBuilder hashCodeBuilder = new HashCodeBuilder(); hashCodeBuilder.append(id); ... } Wrong
  • 65. Hadoop public void map(...) { … for (String word : words) { output.collect(new Text(word), new IntVal(1)); } }
  • 66. Hadoop public void map(...) { … for (String word : words) { output.collect(new Text(word), new IntVal(1)); } } Wrong
  • 67. Hadoop class MyMapper extends Mapper { Text word = new Text(); IntVal one = new IntVal(1); public void map(...) { for (String word : words) { word.set(word); output.collect(word, one); } } }
  • 68. Network Per 1 AdServer instance : Income traffic : ~100Mb/sec Outcome traffic : ~50Mb/sec LB all traffic : Almost 10 Gb/sec
  • 70. AWS ElastiCache SLOWLOG GET 1) 1) (integer) 35 2) (integer) 1391709950 3) (integer) 34155 4) 1) "GET" 2) "2ads10percent_rmywqesssitmfksetzvj" 2) 1) (integer) 34 2) (integer) 1391709830 3) (integer) 34863 4) 1) "GET" 2) "2ads10percent_tteeoomiimcgdzcocuqs"
  • 71. AWS ElastiCache 35ms for GET? WTF? Even java faster
  • 72. AWS ElastiCache ● Strange timeouts (with SO_TIMEOUT 50ms) ● No replication for another cluster ● «Cluster» is not a cluster ● Cluster uses usual instances, so pay for 4 cores while using 1
  • 73. AWS Limits. You never know where ● Network limit ● PPS rate limit ● LB limit ● Cluster start time up to 20 mins ● Scalability limits ● S3 is slow for many files
  • 74. Facts ● HTTP x2 faster HTTPS ● HTTPS keep-alive +80% performance ● Java 7 40% faster Java 6 (our case) ● All IO operations minimized