3. Clickstreams of Events
(pageviews, impressions, clicks, etc)
Events contain attributes
Aggregating Counts and Performance
Breakdowns by Several Dimensions
Slide 3
Our Use Case
4. Slide 4
Our Prior Approach
Two different types of systems
Two different access patterns
Query ability limited
Batch
(Hbase)
Realtime
(Redis)
5. Slide 5
Kafka
Persisted event queue
Consumers keep track of offset
Horizontally scalable, topics can be partitioned, etc.
6. Slide 6
Real-time Layer of Lambda with ES
Daily Index of “raw” events – each event is a document
Elasticsearch Kafka River to index
Real-time processing is trivial, just indexing events
Aggregation of Real-time info pushed to query-time
7. Slide 7
Batch Layer of Lambda with ES
Monthly Index of Aggregated Data Documents
Hourly Re-index events from archived, covers real-time
issues
Aggregate desires breakdowns into documents
When done, note most recent hour completed
8. Slide 8
Serving Layer of Lambda with ES
Query Aggregated Data Documents as much as possible
Query Raw events from last aggregated available to present
Combine Aggregated and Raw query results together and return
We use Node.js, natural fit
9. Slide 9
Why Elasticsearch?
- calculations query-time and flexible
- real-time is simple
Real-time
- some pre-calculation
- query-time ties it together
Batch
Serving
- queries are flexible
- batch and real-time query access patterns similar
10. Slide 10
More Elasticsearch Goodies
Kibana
- Mostly real-time events
- Aggregated documents useful too
Snapshotting for backups
Real-time data daily indexes are optimized