SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Log Everything!
@DC13
Stefan & Mike

Dr. Stefan Schadwinkel

Mike Lohmann

Co-Founder / Analytics Engineer

Co-Founder / Software Engineer

stefan.schadwinkel@deck36.de

mike.lohmann@deck36.de
ABOUT DECK36
Who We Are
–  DECK36 is a young spin-off from ICANS
–  Small team of 7 engineers
–  Longstanding expertise in designing, implementing and operating complex web
systems
–  Developing own data intelligence-focused tools and web services
–  Offering our expert knowledge in Automation & Operations, Architecture &
Engineering, Analytics & Data Logistics
WHAT WE WILL TALK ABOUT
Topics
–  Log everything! – The Data Pipeline.
–  Tackling the Leviathan – Realtime Stream Processing with Storm.
–  JS Client DataCollector: Live Demo
–  Storm Processing with PHP: Live Demo
Log everything!
The Data Pipeline
THE DATA PIPELINE
Requirements
Background: Building and operating multiple education communities
Baseline: PokerStrategy.com KPIs
–  6M registered users, 700k posts/month, 2.8M page impressions/day, 7.6M requests/
day
New products à New business models à New Questions
–  Extendable generic solution
–  Storage and accessability more important than specific, optimized applications
THE DATA PIPELINE
Requirements
Producer

Transport

Storage

Analytics

Realtime Stream Processing
Producer
–  Monolog Plugin, JS Client
Transport
–  Flume 0.9.4 m( à RabbitMQ, Erlang Consumer 
–  Evaluated Apache Kafka
Storage
–  Hadoop HDFS (our very own) à Amazon S3
THE DATA PIPELINE
Logging Pipeline
Producer

Transport

Storage

Analytics

Realtime Stream Processing
Analytics 
-  Hadoop MapReduce à Amazon EMR, Python, R 
-  Exports to Excel (CSV), Qlikview à Amazon
Redshift
Realtime Stream Processing
-  Twitter Storm
THE DATA PIPELINE
Unified Message Format

-  Fixed, guaranteed envelope

-  Processing driven by message content 
-  Single message gets compressed (LZOP) to about 70% of original size "
(1184 B à 817 B)
-  Message bulk gets compressed to about 12-14% of original size "
(@ 42k & 325k messages)
Unified Message Form
THE DATA PIPELINE
Compaction
RabbitMQ consumer (Erlang) stores data to cloud 
-  Relatively large amount of files
-  Mixed messages
We want
-  A few files
-  Messages grouped by „Event Type“ and „Time Partition“
-  Data transformation
Determined by message content

s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo



Hive partitioning!
THE DATA PIPELINE
Compaction
Using Cascalog
-  Based on Clojure (LISP) and Cascading
-  Provides a Datalog-like query language
-  Don‘t LISP? à JCascalog

Very handy features (unavailable in Hive or Pig)
-  Cascading Output Taps can be parameterized by data records
-  Trap location for corrupted records (job finishes for all the correct messages)
-  Runs within the JVM à large available codebase, arbitrary processing is simple
Cacalog Query Syntax

Cascalog is Clojure, Clojure is Lisp

(?<- (stdout)
Query
Operator

Cascading
Output Tap

[?person]
Columns of
the dataset
generated
by the query

(age ?person ?age) … (< ?age 30))
„Generator“

„Predicate“

-  as many as you want
-  both can be any clojure function
-  clojure can call anything that is
available within a JVM
Cacalog Query Syntax

Run the Cascalog processing on Amazon EMR:
./elastic-mapreduce [standard parameters omitted]
--jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar
--main-class icans.cascalogjobs.processing.compaction
--args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error
The Data Pipeline
Data Queries with Hive
Hive is table-based and provides SQL-like syntax
-  Assumes one storage location (directory) per table
-  Simple to use if you know SQL
-  Widely used, rapid development for „simple“ queries
Hive @ Amazon
-  Table locations can be S3
-  „Cluster on demand“ à requires to rebuild Hive metadata 
-  CREATE TABLE for source and target S3 locations
-  Import Table metadata (auto-discovery for partitions)
-  INSERT OVERWRITE to query source table(s) and store to target S3 location
Hive @ Amazon (1)
Hive @ Amazon (2)

We can now simply copy the data from S3 
and import into any local analytical tool
e.g. Excel, Redshift, QlikView, R, etc.
Further Reading

-  More details in the Log Everything! ebook
-  Available at Amazon and DeveloperPress
THE DATA PIPELINE
Still: It’s Batch Processing
-  While quite efficient in flight, the logistics
of getting the job started are significant.
-  Only cost-efficient for long distance
travel.
THE DATA PIPELINE

Instant Insight through Stream Processing
-  Often, only updates for the recent day,
week, or month are necessary
-  Time is of importance when direct
feedback or user interaction is desired
More Wind In The Sails
With Storm
REALTIME STREAM PROCESSING

Instant Insight through Stream Processing
-  Distributed realtime processing
framework
-  Battle-proven by Twitter
-  All *BINGO-Abilities fulfilled!
-  Hadoop = data batch processing; Storm
= realtime data processing 
-  More (and maybe new) *BINGO: DRPC,
ETL, RTET, Spouts, Bolts, Tuple,
Topology 
-  Easy to use (Really!)
Realtime Stream Processing Infrastructure with Storm

Producer

Transport

Analytics

Storage
Realtime Data Stream Analytics

Storm-Cluster
Supervisor
NodeJS

Supervisor

S3

Worker

Worker
Worker
Zabbix
Graylog

Apps
&Server

Queue

Zookeeper

Nimbus
(Master)

DB
REALTIME STREAM PROCESSING
JS Client Features
-  Event system
-  Master/Slave Tabs
-  Local queuing of data
-  Ability to use node modules
-  Easy to extend
-  Complete development suite
-  Deliver bundles with vendors or not
Realtime Stream Processing - Loading the JS Client

<script .. src=“https://cdn.tradimo.com/js/starlog-client.min.js?5193e1ba0325c756b78d87384d2f80e9"></script>

https://../starlog-client.min.js

Create signed
cookie

starlog-client.min.js
Set-Cookie:UUID
/socket.io/1/websockets
Upgrade: websockets
Cookie: UUID
Established connection

Check cookie

HTTP 101 – Protocol Change
Connection: Upgrade
Upgrade: websocket
Collecting Data

Sending data in UMF
Sending data to the client

UMF
NodeJS
Counts
Queue

Backend
Magic
Queue
Realtime Stream Processing - JS Client in action

UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge

ClickEvent collector

register onclick Event

Clicked-Data

observe

localstorage

Clicked-Data

Clicked-Data-UMF
SocketConnect
NodeJS
Realtime Stream Processing - JS Client in action
function ClickFetcher()
{
this.collectData = function (callback)
{
var clicked = 1;
logger.debug('ClickFetcher - collectData called!');
window.onclick = function() {
var collectedData = {
key : window.location.host.toString()+window.location.pathname.toString(),
value: {
payload: clicked,
timestamp: +new Date()
}
};
localstorage.set(collectedData, function (storageResult)
{
logger.debug("err = " + storageResult.hasError());
logger.debug("storageResult = " + storageResult);
}, false, true, true);
clicked++;
};
};
}
var clickFetcher = new ClickFetcher();
starlogclient.on(starlogclient.COLLECTINGDATA, clickFetcher.collectData);
Client Live Demo 


https://localhost:3001/test/1-page-stub.html
REALTIME STREAM PROCESSING
Producer Libraries
-  LoggingComponent: Provides interfaces, filters and handlers
-  LoggingBundle: Glues all together for Symfony2
-  Drupal Logging Module: Using the LoggingComponent
-  JS Frontend Client: LogClient Framework for Browsers

https://github.com/ICANS/IcansLoggingComponent
https://github.com/ICANS/IcansLoggingBundle
https://github.com/ICANS/drupal-logging-module
https://github.com/DECK36/starlog-js-frontend-client
Realtime Stream Processing - PHP & Storm

UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge
Using PHP for that!
https://github.com/Lazyshot/storm-php/blob/master/lib/storm.php

Clicked-Data-UMF

Queue

Event: „Star Trek Commander“ Badge
Storm & PHP Live Demo
REALTIME STREAM PROCESSING
Get Inspired!
Powered-by Storm: https://github.com/nathanmarz/storm/wiki/Powered-By
-  50+ companies (Twitter, Yahoo, Groupon, Ooyala, Baidu, Wayfair, …)
-  Ads & real-time bidding, Data-centric (Economic, Environmental, Health), User interactions
Language-agnostic backend systems (Operate Storm, Develop in PHP)
Streaming „counts“: Sentiment Analysis, Frequent Items, Multi-armed Bandits, …
DRPC: Custom user feeds, Complex Queries (i.e. trace graph links)
Realtime, distributed ETL
-  Buffering / Retries
-  Integrate Data: Third-party API, Machine Learning
-  Store to DBs, Search engines, etc
Questions?
Thanks a lot!
You can find us:

github.com/DECK36

info@deck36.de

deck36.de

Más contenido relacionado

La actualidad más candente

Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseGuido Schmutz
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Databricks
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
MongoDB World 2016: Keynote
MongoDB World 2016: KeynoteMongoDB World 2016: Keynote
MongoDB World 2016: KeynoteMongoDB
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowC4Media
 
How to write your database: the story about Event Store
How to write your database: the story about Event StoreHow to write your database: the story about Event Store
How to write your database: the story about Event StoreVictor Haydin
 
Streaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesStreaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesDatabricks
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksMatthias Niehoff
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2VecKouhei Nakaji
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
201809 DB tech showcase
201809 DB tech showcase201809 DB tech showcase
201809 DB tech showcaseKeisuke Suzuki
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 

La actualidad más candente (20)

Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
MongoDB World 2016: Keynote
MongoDB World 2016: KeynoteMongoDB World 2016: Keynote
MongoDB World 2016: Keynote
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
 
How to write your database: the story about Event Store
How to write your database: the story about Event StoreHow to write your database: the story about Event Store
How to write your database: the story about Event Store
 
Streaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesStreaming Analytics for Financial Enterprises
Streaming Analytics for Financial Enterprises
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
201809 DB tech showcase
201809 DB tech showcase201809 DB tech showcase
201809 DB tech showcase
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 

Similar a Log everything! @DC13

Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overviewjimliddle
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyDaniel Hochman
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Dataplumbee
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQlxfontes
 

Similar a Log everything! @DC13 (20)

Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 

Más de DECK36

Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)DECK36
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsDECK36
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...DECK36
 
Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)DECK36
 
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)DECK36
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer lookDECK36
 

Más de DECK36 (7)

Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 
Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)
 
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer look
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Log everything! @DC13

  • 1.
  • 3. Stefan & Mike Dr. Stefan Schadwinkel Mike Lohmann Co-Founder / Analytics Engineer Co-Founder / Software Engineer stefan.schadwinkel@deck36.de mike.lohmann@deck36.de
  • 4. ABOUT DECK36 Who We Are –  DECK36 is a young spin-off from ICANS –  Small team of 7 engineers –  Longstanding expertise in designing, implementing and operating complex web systems –  Developing own data intelligence-focused tools and web services –  Offering our expert knowledge in Automation & Operations, Architecture & Engineering, Analytics & Data Logistics
  • 5. WHAT WE WILL TALK ABOUT Topics –  Log everything! – The Data Pipeline. –  Tackling the Leviathan – Realtime Stream Processing with Storm. –  JS Client DataCollector: Live Demo –  Storm Processing with PHP: Live Demo
  • 7. THE DATA PIPELINE Requirements Background: Building and operating multiple education communities Baseline: PokerStrategy.com KPIs –  6M registered users, 700k posts/month, 2.8M page impressions/day, 7.6M requests/ day New products à New business models à New Questions –  Extendable generic solution –  Storage and accessability more important than specific, optimized applications
  • 8. THE DATA PIPELINE Requirements Producer Transport Storage Analytics Realtime Stream Processing Producer –  Monolog Plugin, JS Client Transport –  Flume 0.9.4 m( à RabbitMQ, Erlang Consumer –  Evaluated Apache Kafka Storage –  Hadoop HDFS (our very own) à Amazon S3
  • 9. THE DATA PIPELINE Logging Pipeline Producer Transport Storage Analytics Realtime Stream Processing Analytics -  Hadoop MapReduce à Amazon EMR, Python, R -  Exports to Excel (CSV), Qlikview à Amazon Redshift Realtime Stream Processing -  Twitter Storm
  • 10. THE DATA PIPELINE Unified Message Format -  Fixed, guaranteed envelope -  Processing driven by message content -  Single message gets compressed (LZOP) to about 70% of original size " (1184 B à 817 B) -  Message bulk gets compressed to about 12-14% of original size " (@ 42k & 325k messages)
  • 12. THE DATA PIPELINE Compaction RabbitMQ consumer (Erlang) stores data to cloud -  Relatively large amount of files -  Mixed messages We want -  A few files -  Messages grouped by „Event Type“ and „Time Partition“ -  Data transformation Determined by message content s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo Hive partitioning!
  • 13. THE DATA PIPELINE Compaction Using Cascalog -  Based on Clojure (LISP) and Cascading -  Provides a Datalog-like query language -  Don‘t LISP? à JCascalog Very handy features (unavailable in Hive or Pig) -  Cascading Output Taps can be parameterized by data records -  Trap location for corrupted records (job finishes for all the correct messages) -  Runs within the JVM à large available codebase, arbitrary processing is simple
  • 14. Cacalog Query Syntax Cascalog is Clojure, Clojure is Lisp (?<- (stdout) Query Operator Cascading Output Tap [?person] Columns of the dataset generated by the query (age ?person ?age) … (< ?age 30)) „Generator“ „Predicate“ -  as many as you want -  both can be any clojure function -  clojure can call anything that is available within a JVM
  • 15. Cacalog Query Syntax Run the Cascalog processing on Amazon EMR: ./elastic-mapreduce [standard parameters omitted] --jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar --main-class icans.cascalogjobs.processing.compaction --args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error
  • 16. The Data Pipeline Data Queries with Hive Hive is table-based and provides SQL-like syntax -  Assumes one storage location (directory) per table -  Simple to use if you know SQL -  Widely used, rapid development for „simple“ queries Hive @ Amazon -  Table locations can be S3 -  „Cluster on demand“ à requires to rebuild Hive metadata -  CREATE TABLE for source and target S3 locations -  Import Table metadata (auto-discovery for partitions) -  INSERT OVERWRITE to query source table(s) and store to target S3 location
  • 18. Hive @ Amazon (2) We can now simply copy the data from S3 and import into any local analytical tool e.g. Excel, Redshift, QlikView, R, etc.
  • 19. Further Reading -  More details in the Log Everything! ebook -  Available at Amazon and DeveloperPress
  • 20. THE DATA PIPELINE Still: It’s Batch Processing -  While quite efficient in flight, the logistics of getting the job started are significant. -  Only cost-efficient for long distance travel.
  • 21. THE DATA PIPELINE Instant Insight through Stream Processing -  Often, only updates for the recent day, week, or month are necessary -  Time is of importance when direct feedback or user interaction is desired
  • 22. More Wind In The Sails With Storm
  • 23. REALTIME STREAM PROCESSING Instant Insight through Stream Processing -  Distributed realtime processing framework -  Battle-proven by Twitter -  All *BINGO-Abilities fulfilled! -  Hadoop = data batch processing; Storm = realtime data processing -  More (and maybe new) *BINGO: DRPC, ETL, RTET, Spouts, Bolts, Tuple, Topology -  Easy to use (Really!)
  • 24. Realtime Stream Processing Infrastructure with Storm Producer Transport Analytics Storage Realtime Data Stream Analytics Storm-Cluster Supervisor NodeJS Supervisor S3 Worker Worker Worker Zabbix Graylog Apps &Server Queue Zookeeper Nimbus (Master) DB
  • 25. REALTIME STREAM PROCESSING JS Client Features -  Event system -  Master/Slave Tabs -  Local queuing of data -  Ability to use node modules -  Easy to extend -  Complete development suite -  Deliver bundles with vendors or not
  • 26. Realtime Stream Processing - Loading the JS Client <script .. src=“https://cdn.tradimo.com/js/starlog-client.min.js?5193e1ba0325c756b78d87384d2f80e9"></script> https://../starlog-client.min.js Create signed cookie starlog-client.min.js Set-Cookie:UUID /socket.io/1/websockets Upgrade: websockets Cookie: UUID Established connection Check cookie HTTP 101 – Protocol Change Connection: Upgrade Upgrade: websocket Collecting Data Sending data in UMF Sending data to the client UMF NodeJS Counts Queue Backend Magic Queue
  • 27. Realtime Stream Processing - JS Client in action UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge ClickEvent collector register onclick Event Clicked-Data observe localstorage Clicked-Data Clicked-Data-UMF SocketConnect NodeJS
  • 28. Realtime Stream Processing - JS Client in action function ClickFetcher() { this.collectData = function (callback) { var clicked = 1; logger.debug('ClickFetcher - collectData called!'); window.onclick = function() { var collectedData = { key : window.location.host.toString()+window.location.pathname.toString(), value: { payload: clicked, timestamp: +new Date() } }; localstorage.set(collectedData, function (storageResult) { logger.debug("err = " + storageResult.hasError()); logger.debug("storageResult = " + storageResult); }, false, true, true); clicked++; }; }; } var clickFetcher = new ClickFetcher(); starlogclient.on(starlogclient.COLLECTINGDATA, clickFetcher.collectData);
  • 29. Client Live Demo https://localhost:3001/test/1-page-stub.html
  • 30. REALTIME STREAM PROCESSING Producer Libraries -  LoggingComponent: Provides interfaces, filters and handlers -  LoggingBundle: Glues all together for Symfony2 -  Drupal Logging Module: Using the LoggingComponent -  JS Frontend Client: LogClient Framework for Browsers https://github.com/ICANS/IcansLoggingComponent https://github.com/ICANS/IcansLoggingBundle https://github.com/ICANS/drupal-logging-module https://github.com/DECK36/starlog-js-frontend-client
  • 31. Realtime Stream Processing - PHP & Storm UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge Using PHP for that! https://github.com/Lazyshot/storm-php/blob/master/lib/storm.php Clicked-Data-UMF Queue Event: „Star Trek Commander“ Badge
  • 32. Storm & PHP Live Demo
  • 33. REALTIME STREAM PROCESSING Get Inspired! Powered-by Storm: https://github.com/nathanmarz/storm/wiki/Powered-By -  50+ companies (Twitter, Yahoo, Groupon, Ooyala, Baidu, Wayfair, …) -  Ads & real-time bidding, Data-centric (Economic, Environmental, Health), User interactions Language-agnostic backend systems (Operate Storm, Develop in PHP) Streaming „counts“: Sentiment Analysis, Frequent Items, Multi-armed Bandits, … DRPC: Custom user feeds, Complex Queries (i.e. trace graph links) Realtime, distributed ETL -  Buffering / Retries -  Integrate Data: Third-party API, Machine Learning -  Store to DBs, Search engines, etc
  • 36. You can find us: github.com/DECK36 info@deck36.de deck36.de