2. Agenda
15’ Intro (Peter)
35’ Azure Stream Analytics and ML (Jan)
5’ short break
35’ Google Cloud DataFlow (Alex)
35’ Amazon AWS ML (Nils)
3. Many thanks to
Microsoft Belux
Jan, Alex, Nils
@maasg, @svendfx
BigData.be, DataScience.be, AWS Belgium
you !
4. Next StreamProcessing.be Meetup
Thu, June 25, 2015, near Mechelen station
(looking for a location +/- 50 ppl)
● Introduction to Apache Kafka (Svend)
● Akka Streams and Kinesis (Peter)
● Understanding Spark Streaming (Gerard)
5. whoami : Peter Vandenabeele @peter_v
All Things Data (my consultancy)
current clients:
Real Impact Analytics
Telecom Analytics (emerging markets)
“Green” start-up (stealth mode)
IoT project (see next Meetup)
8. E.g. collaborative research (2013)
UniProt
(180 GB)
monthly update
consumer
update cost
≅
freq (1/month)
*
size (180 GB)
*
# consumers (5)
fetch + load + index
FULL data set
9. solution: Stream of updates (CDC)
Users table
continuous
updates
consumer
update cost
≅
Rate of Change
(10% / month)
*
size * # consumers
fetch + load
ONLY updates
stream
3M entries
300k updates/month
(independent of consumer update frequency)
10. Why Stream Processing ?
Real-time
*
Big Data
*
Distributed processing
(“many collaborators”)
11. Stream becomes the “master data”
● see stream as the master data (not the DB)
● allows real-time, distributed processing
● allows unification between:
○ operational teams
○ analytics teams
○ security, ...
● e.g. Kafka at LinkedIn (Kappa architecture)
12. Kafka (LinkedIn) : Martin Kleppmann
source : Martin Kleppmann
at strata Hadoop London
13. Kafka (LinkedIn) : Jay Kreps
source: Jay Kreps
on slideshare
“I ♥ Log”
Real-time Data and Apache Kafka
14. Why Stream Processing ?
Peter : real-time * (big data * distributed proc.)
Nathan Marz : recovery from human error + ...
Jay Kreps : organizational scalability + ...
Martin Kleppmann : data agility + …
YOU : ??? let’s discuss at beer ...
15. Speakers for today
● Jan Tielens (Microsoft) @jantielens
● Alex Van Boxel (Vente-Exclusive.com)
@alexvb
● Nils De Moor (Woorank) @ndemoor