4. A High Level look at RTB
1. Browsers visit Publishers and create impressions.
2. Publishers sell impressions via Exchanges.
3. Exchanges serve as auction houses for the impressions
4. On behalf of the marketer, m6d bids the impressions via the
auction house. If m6d wins, we display our ad to the
browser.
5. Performance and Data
• Billions and billions of bid requests a day
• A single request can result in multiple
Cassandra Operations!
• One cluster is just under 10TB and growing
• Low latency requirement below 120 ms typical
• Limited data available to m6d via the exchange
6. Segment Data
Segments are how we assign product or service
affinity to a group of users. User’s we consider to be
like minded with respect to a given brand will be
placed in the same segment.
Segment Data is just one component of our
overarching data model.
Segments help to reduce the number of calculations
we do in real time.
7. Old Approach for Segment Data
Application Nodes
(Tomcat + MySQL )
Limitations
•Periodically updated.
MySQL Data Push Event Logs •Only subsection of
the data.
•Cluster performance
is effected during a
data push.
Aggregation Hadoop
8. Cassandra Approach
for Segment Data
Application Nodes Better!
(Tomcat + Less • Updating in real time now
MySQL Usage) possible
• Distributed not duplicated
• Less complexity to manage
• Storing more information
• We can now bid on users
Cassandra sooner!
9. One Ring to rule them all
http://askyyy.blog.163.com/blog/static/12345759920104288193
99/
10. Peer to Peer
per operation replication
Fail fast, self-healing
Each write goes to all natural endpoints
Hinted handoff if destination is down
Repair on Read
No more:
STOP SLAVE; SET GLOBAL
SQL_SLAVE_SKIP_COUNTER = 1; START
SLAVE;
11. Multi Data Center
No designing and managing complex replication topologies
create keyspace world
with placement_strategy =
'org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options={1:3, 2:3, 3:3};
The same process as single data center
No log shipping, or separate processes to run
12. Monitoring & Management
Many Many things to monitor with JMX
Nice command line tools
Most values can be tweaked at run time
13. Capacity Planning
How many
Rows
Columns
Size of Average Column
Latency requirements
Throughput read and writes per sec
15. Max 2 billion columns per row
Awesome
Unless you accidentally write 2 billion
columns to a row key named “null”
Check maxRowSize JMX
Watch logs for messages about compacting
large rows