Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Building Real-Time Data Pipelines with Kafka, Spark, and MemSQL
1. Building Real-Time Data Pipelines
with Ka(a, Spark, and MemSQL
PHX Data Conference 29 Oct 2016
@garyorenstein @memsql
(c) Gary Orenstein and MemSQL
2. About Me: Gary Orenstein
• MemSQL - real--me database
• Fusion-io (SanDisk) - flash memory solu-ons
• Compellent (Dell) - enteprise storage
• experience in networking, caching, file systems
• co-author two O'Reilly Books
• Building Real-Time Data Pipelines (2015)
• The Path to Predic-ve Analy-cs and Machine Learning (2016)
(c) Gary Orenstein and MemSQL
3. Digital businesses' inexhaus0ble
demand for faster performance,
greater scalability and deeper real-
4me insight is boos0ng the market for
IMC technologies, which is expected
to reach $13 billion by 2020.
- Gartner
(c) Gary Orenstein and MemSQL
7. Combine the power of a real-2me
transforma2on 2er
with the power of a real-.me distributed,
persistent, database
making Spark results more accessible to all
(c) Gary Orenstein and MemSQL
19. Everything We Know About Data
Movement Is Wrong
(c) Gary Orenstein and MemSQL
20. 1. We're finished with batch and the world is moving to streaming
and real-9me
2. Topologies need to change
3. Messaging seman9cs need to improve
(c) Gary Orenstein and MemSQL
21. Familiar data integra-on pa0erns
centered on physical data
movement (bulk/batch data
movement, for example) are no
longer a sufficient solu-on for
enabling a digital business.
> Gartner
(c) Gary Orenstein and MemSQL
22. I hate batch processing so much that
I won't even use the dishwasher.
I just wash, dry, and put away real
;me.
> Ed Weissman (@edw519)
(c) Gary Orenstein and MemSQL
38. Germany Just Got Almost All of
Its Power From Renewable Energy
May 15, 2016
Bloomberg: h,p://www.bloomberg.com/news/ar5cles/2016-05-16/germany-just-got-almost-all-of-its-
power-from-renewable-energy
(c) Gary Orenstein and MemSQL
39. Investment in renewables
reached $286 billion worldwide
in 2015
BBC: h&p://www.bbc.com/news/science-environment-36420750
(c) Gary Orenstein and MemSQL
47. Enabling predic.ve analy.cs
• Use exis(ng models from SAS
• Create models in Spark MLlib
• Predic(ve scoring as part of the pipeline
(c) Gary Orenstein and MemSQL
51. Business Intelligence
Details
• Na$vely connect to BI tools like Tableau
• Also Zoomdata, Looker,
MicroStrategy
• Business analysts inside your company
can use a tool they know and love
(c) Gary Orenstein and MemSQL