The document describes the evolution of Incapsula's big data system over four generations from 2010 to 2015. Each generation improved on processing throughput, read performance, and scalability by simplifying the data model and moving to distributed processing across multiple points of presence. Key changes included moving from a centralized SQL database to NoSQL storage, implementing multi-threaded processing, and distributing workloads across data centers.
An Inside Look at a Sophisticated Multi-Vector DDoS Attack
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surge2014]
1. From 1000/day to 1000/sec
The evolution of our big data system
Yoav Cohen
VP Engineering
2. This Talk
A walk-through of how we built our big-data system
Incapsula, Inc. / Proprietary and Confidential. 2 All Rights Reserved.
3. About Incapsula
Vendor of a cloud-based Application Delivery Controller
Web Application
Firewall
Incapsula, Inc. / Proprietary and Confidential. 3 All Rights Reserved.
Load-
Balancing
CDN &
Optimizer
DDoS
Protection
4. How does it work?
Incapsula, Inc. / Proprietary and Confidential. 4 All Rights Reserved.
5. Modeling Web-Traffic
1. First request to a website starts a new session
2. Subsequent requests are part of the same session
3. After being idle for 30 minutes the session ends
Session 1 starts 10:03:01 GET www.incapsula.com/
Session 1 request 1 10:03:10 GET www.incapsula.com/ddos
Session 1 request 2 10:03:12 GET www.incapsula.com/cdn
… …
Session 1 ends
Session 2 starts 10.35:05 GET www.incapsula.com/signup
Incapsula, Inc. / Proprietary and Confidential. 5 All Rights Reserved.
6. The Data
A stream of messages in Google Protobuf format
msgTid: 144021710000000001
ype: SESSION_MESSAGE_CREATE
siteID: 7
startTime: 1409578192017
clientIP: ******
countryCode: "US"
entryUrlID: 5544402418256865164
visitorID: "7e59c804-f663-4595-a0df-35d9b02eb747"
userAgent: "Incapsula Site Monitor - OPS"
visitorClAppId: 209
…
Incapsula, Inc. / Proprietary and Confidential. 6 All Rights Reserved.
requestStartTime: 1410004769258
responseStartTime: 1410004769258
responseEndTime: 1410004769261
sessionID: 151009030147748952
urlID: 5544402418256865164
request_id: 567472919066130553
queryString: ""
postBody: ""
statusCode: 200
serialNumber: 1
content_length: 6350
protocol: HTTP
requestResult: REQ_CACHED_FRESH
...
7. The Problem
Transforming the stream of messages to readable data
• Processing throughput
• Read performance
• Scalability
Incapsula, Inc. / Proprietary and Confidential. 7 All Rights Reserved.
?
Session 1 starts
Session 1 request 1
Session 1 request 2
…
Session 1 ends
Session 2 starts
…
9. Gen 1
2010 – 2011
Incapsula, Inc. / Proprietary and Confidential. 9 All Rights Reserved.
Gen 2
2011 – 2013
Gen 3
2013
Gen 4
2015
System Evolution
10. Gen 1: Code Name “rtproc”
Incapsula, Inc. / Proprietary and Confidential. 10 All Rights Reserved.
11. Gen 1: OLAP Cube
• A text book solution
• Time x IP x Country x … # requests, # attacks, …
• Slice and dice to answer any question (how many attack from
Germany in Jan-2010?)
Incapsula, Inc. / Proprietary and Confidential. 11 All Rights Reserved.
dimensions counters
select sum(number_of_attacks) from Attacks where
site_id=140 and country_code=‘DE’ and time > ‘20100100’
and time < ‘20100200’
12. Gen 1: OLAP Cube
• Loading data for individual attacks requires joins:
Incapsula, Inc. / Proprietary and Confidential. 12 All Rights Reserved.
13. Gen 1: Analysis
• Generic solution
• Very big tables
• Overly complex (lots of moving parts)
Processing
Read
Scalability
Incapsula, Inc. / Proprietary and Confidential. 13 All Rights Reserved.
14. Gen 1
2010 – 2011
Incapsula, Inc. / Proprietary and Confidential. 14 All Rights Reserved.
Gen 2
2011 – 2013
Gen 3
2013
Gen 4
2015
System Evolution
15. Gen 2: Code Name “rtprocng”
• Main problems to solve:
> Read Performance
> Simplify
• New approach:
> Count things on the edge instead of centrally
> NoSQL model to improve read performance (no joins)
Incapsula, Inc. / Proprietary and Confidential. 15 All Rights Reserved.
16. Gen 2: Simpler Design
Incapsula, Inc. / Proprietary and Confidential. 16 All Rights Reserved.
17. Gen 2: Stats NoSQL Storage
• One document per day, containing
all the data to build the charts
• Read performance improved (one
lookup for all charts)
• Can even load parts of the data
(MongoDB feature)
Incapsula, Inc. / Proprietary and Confidential. 17 All Rights Reserved.
{"_id" : "7_09-04-2014",
"pageviews" : [
NumberLong(2369),
NumberLong(2380),
NumberLong(2520),
NumberLong(5651),
NumberLong(2912),
NumberLong(3357),
NumberLong(3723),
NumberLong(3301),
NumberLong(3092),
NumberLong(2984),
NumberLong(3791),
NumberLong(3069)
],
"humsess" : [
NumberLong(213),
NumberLong(258),
NumberLong(298),
…
18. Gen 2: Events NoSQL Storage
• One document per session, containing
all its actions
• Lookups are easy (no joins)
• Searches use MongoDB indexes (OK
but not great)
Incapsula, Inc. / Proprietary and Confidential. 18 All Rights Reserved.
{
"_id": 226000330131098770,
"start": {
"$date": "2014-09-09T10:19:00Z"
},
"cc": ["CA"],
"securityFlags": ["rid4"],
"badbot": true,
"prxy": [226],
"clappt": 1,
"actns": [
{
"reqRes": 10,
"u": "www.incapsula.com/",
"attack": [
{
"loc": 1,
"acode": 0,
"act": 7,
"rid": 4,
"more": 0,
"atype": 314,
"hidden": false,
"match": "",
"pval": ""
}
...
19. Gen 2: Python Processor
• Batch process:
> Process the files in the directory for up to X minutes
> Flush to storage and exit
• How to achieve good processing throughput?
> Cache objects in memory
> When processing messages, update object in memory
> When process finishes, flush all the objects from memory to
storage
Incapsula, Inc. / Proprietary and Confidential. 19 All Rights Reserved.
20. Gen 2 Storage Bottleneck
• Single DB for all sessions
• Reality check:
> MongoDB coarse-grained locking (lock per DB server)
> When batch process flushes, UIs are stuck (lock prefers writes)
> Dropping old data impossible
> Fragmentation caused excessive disk usage
Incapsula, Inc. / Proprietary and Confidential. 20 All Rights Reserved.
21. Gen 2 Storage Re-Factoring
• Single DB DB per day
> Drop DBs that are X days old
• Live sessions Live DB
“Dead” sessions per-day DB
> 0% fragmentation in per-day DBs
> Daily maintenance of Live DB (but it’s relatively small)
• DB locking not resolved (later MongoDB versions
have lock per DB)
Incapsula, Inc. / Proprietary and Confidential. 21 All Rights Reserved.
22. Gen 2: Analysis
• Simple and scalable
• MongoDB is easy to get started with
> Over time TCO increases
• Reached batch processing limits
Processing
Read
Scalability
Incapsula, Inc. / Proprietary and Confidential. 22 All Rights Reserved.
23. Gen 1
2010 – 2011
Incapsula, Inc. / Proprietary and Confidential. 23 All Rights Reserved.
Gen 2
2011 – 2013
Gen 3
2013
Gen 4
2015
System Evolution
24. Gen 3: Code Name “Graceland”
• Main problems to solve:
> Faster, online processing
> Better search capabilities
• New approach:
> Multi-threaded Java-based processor:
- Faster protobuf library than python
- Keep objects in memory for longer periods of time and reduce flushes
to storage
> Lucene for search
> A DB we can understand and control
Incapsula, Inc. / Proprietary and Confidential. 24 All Rights Reserved.
25. Gen 3: Design
Incapsula, Inc. / Proprietary and Confidential. 25 All Rights Reserved.
26. Gen 3: Multi-Threaded Java Processor
• One reader thread reads the
files and distributes the data
between the workers
• Workers process the data
> Load object from cache
> If not in cache, load from
storage
> Update object
> Flush to storage
- Periodically
- On certain events
Incapsula, Inc. / Proprietary and Confidential. 26 All Rights Reserved.
27. Gen 3: Cache Design
• Design goal: large cache, but not all in JVM heap
• Layered LRU cache (extends LinkedHashMap)
• One layer is the map, backing layer on tmpfs or disk
Incapsula, Inc. / Proprietary and Confidential. 27 All Rights Reserved.
28. Gen 3 Stats Storage (“Segmented Storage”)
• Binary file per day
• Keep recent files separate, archive older files
2014-02-03 2014-02-03.pbz 0 14325654845
2014-02-02 2014-02-02.pbz 0 14326542128
2014-02-01 2014-02-03.pbz 0 14325654845
2014-01-31 archive.pbz 76515 14325654845
...
2014-01-01 archive.pbz 0 14365428845
Incapsula, Inc. / Proprietary and Confidential. 28 All Rights Reserved.
29. Gen 3 Stats Storage (Segmented Storage)
• Files are served via nginx
• Clients keep cache
Incapsula, Inc. / Proprietary and Confidential. 29 All Rights Reserved.
30. Gen 3 Events Storage
• Tried different DBs:
> LevelDB, KyotoCabinet
- Storing the raw session data inside the lucene index
- Index memory footprint grew (all the session data got
memory-mapped)
> LevelDB, KyotoCabinet
- Couldn’t get these to work reliably
> Cassandra
- Rule of thumb: if your DB has its own conference, you
need a DBA
- We felt it’s easier to write our own than read the docs
Incapsula, Inc. / Proprietary and Confidential. 30 All Rights Reserved.
31. Gen 3 Events Storage (“Indexing Partition”)
• A partition (directory) per-day, containing:
> Lucene index of sessions
> Big file with sessions in it
• Same approach as in Gen 2 for live sessions:
> Live sessions Live partition
> Dead sessions per-day partitions
> 0% fragmentation
> Complicates searching a bit
> Live partitions require cleanup
or re-building
Incapsula, Inc. / Proprietary and Confidential. 31 All Rights Reserved.
32. Gen 3 Events Storage (“Indexing Partition”)
• Searches are more efficient:
> Search requests are served directly from index
> Session data is loaded only on-demand, and via nginx using HTTP
Range header
Incapsula, Inc. / Proprietary and Confidential. 32 All Rights Reserved.
33. Gen 3: Analysis
• Good processing throughput
• Good read performance
• Reaching JVM issues (big heap)
Processing
Read
Scalability
Incapsula, Inc. / Proprietary and Confidential. 33 All Rights Reserved.
34. Gen 1
2010 – 2011
Incapsula, Inc. / Proprietary and Confidential. 34 All Rights Reserved.
Gen 2
2011 – 2013
Gen 3
2013
Gen 4
2015
System Evolution
35. Gen 4: 2015
• Based on Gen 3
• Distribute work to more than one system
> One data server in each POP (> 20 POPs)
> Each POP processes and stores its own data
> Upload processed outputs to central servers or search on all POP
servers
Incapsula, Inc. / Proprietary and Confidential. 35 All Rights Reserved.
36. Summary
• It is equally important to understand how your system works
as it is to understand every other aspect of your business
• At some point we realized it’s better for us to build our
software from scratch than use off the shelves products as
black-boxes:
> We need to find people who know the products
- Which is crazy since we tried tons of them over the last 4 years
> We usually have less requirements
- Who needs multi-DC replication since day 1?
> We prefer coding it than reading documentations and
stackoverflows
- Then we can hack it in the middle of the night if needed
- It’s way more fun (at least for the developers…)
Incapsula, Inc. / Proprietary and Confidential. 36 All Rights Reserved.
38. Types of Data
Statistics – just numbers, used for charts, billing, etc.
Incapsula, Inc. / Proprietary and Confidential. 38 All Rights Reserved.
39. Types of Data
Events – in-depth information, used for forensics and research
Incapsula, Inc. / Proprietary and Confidential. 39 All Rights Reserved.
Notas del editor
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level