alooma's CTO Yair Weinberger, discussed the false promise of "Exactly-Once Processing". In his talk, Yair explained why there is no such thing as "Exactly-Once Processing", and how idempotency can only get you close when implementing states in Apache Storm / Trident.
https://www.alooma.io/blog/tlv-data-plumbers-first-meetup
2. #TLVDataPlumbers
● Create a community of data plumbers
● "The best minds of my generation are deleting
commas from log files, and that makes me sad."
@medriscoll, http://adage.com/...
● “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle
to Insights” @mrogati, http://www.nytimes.com/2014/08/18/...
● “It’s time to take care of our clogged data plumbing”,
Tom Davenport, http://venturebeat.com/2014/09/18/...
3. A real-time platform that abstracts the data layer
Mobile
Servers
Sensors
Devices
tons of plumbing to
make it work
Unscalable Rigid Leaky Slow
Analytics Personalization Monitoring And more…
4. A real-time platform that abstracts the data layer
Scalable Flexible Reliable Fast
Servers
Sensors
Devices
Mobile
Analytics Personalization Monitoring And more…
14. Common myths
● We use trident, so we guarantee “exactly once”
● If a tuple in a transaction failed, the whole
transaction will be repeated, and the computation
done on the transaction so far will be discarded
● Someone already put up a working transactional
state
16. Even if the backing store is transactional!
Theory:
- begin commit => begin transaction in the backing store
- update state => write to the backing store
- commit => commit in the backing store
Practice:
- This only works for single-threaded states!
- commit is called once per thread
Experience:
- Even in single threaded state, the thread can crash
between commit to the backing store and ack
18. Fake it ‘till you make it
Idempotency to the rescue
Simple (non distributed) example: TCP sequence numbers
19. Fake it ‘till you make it
Idempotency to the rescue
Simple (non distributed) example: TCP sequence numbers
20. Fake it ‘till you make it
Idempotency to the rescue
Simple (non distributed) example: TCP sequence numbers
21. Fake it ‘till you make it
Idempotency to the rescue
Simple (non distributed) example: TCP sequence numbers
22. Fake it ‘till you make it
Idempotent distributed examples
- Database with primary key (INSERT … IGNORE)
- Kafka with log compaction
23. Kafka Log Compaction
● Stores only the most recent message per key, thus
idempotent.
● We can achieve exactly once even without
transactional state!