SlideShare a Scribd company logo
1 of 33
AMIDST Toolbox – Flink Forward 1
A Java Toolbox for Scalable Probabilistic Machine Learning
12-14 SEP 2016, Berlin
Ana M. Martínez
Aalborg University
ana@cs.aau.dk
AMIDST Toolbox – Flink Forward 2
Who
are we?
THE AMIDST CONSORTIUM
3AMIDST Toolbox – Flink Forward
4
Running
Use Case
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
5
Predicting Defaulting Clients
Predicts probability a customer will default within 2
years
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
§  Daily data for millions of clients
§  Tons of missing data.
§  Odd distributions.	
6AMIDST Toolbox – Flink Forward
7
Toolbox
presentation
AMIDST Toolbox – Flink Forward
GENERAL DESCRIPTION
8AMIDST Toolbox – Flink Forward
AMIDST APPROACH
9
Data Knowledge
Openbox Models
Blackbox Inference Engine
(Powered by Flink)
	
AMIDST Toolbox – Flink Forward
10
Main
Features
AMIDST Toolbox – Flink Forward
PGMS
11
Probabilistic graphical models (PGMs)
Specify your model using probabilistic graphical models with
latent variables and temporal dependencies
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
12AMIDST Toolbox – Flink Forward
PGMS
13
Custom Gaussian Mixture Model
Hij defines a local mixture.
Hi defines a global mixture.
AMIDST Toolbox – Flink Forward
PGMS
14
RUNNING CODE EXAMPLE
//Set-up Flink session.

final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = "hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename,
false);



//Build the model

Model model = new CustomGaussianMixture(data.getAttributes());




AMIDST Toolbox – Flink Forward
SCALABLE INFERENCE
15
Scalable Learning
Perform Bayesian inference on your probabilistic models with
powerful approximate and scalable algorithms.
AMIDST Toolbox – Flink Forward
SCALABLE INFERENCE
16
d-VMP Algorithm - Coded as iterative map-reduce task
A state-of-the-art distributed variational message passing
algorithm.
AMIDST Toolbox – Flink Forward
SCALABLE INFERENCE
17
RUNNING CODE EXAMPLE
AMIDST Toolbox – Flink Forward
//Set-up Flink session.

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = “hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename, false);



//Build the model

Model model = new CustomGaussianMixture(data.getAttributes());



//Learn the model

model.updateModel(data); 



DATA STREAMS
18
Data Streams
Update your models when new data is available. This makes our
toolbox appropriate for learning from data streams.
AMIDST Toolbox – Flink Forward
DATA STREAMS
19
//Set-up Flink session.

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = “hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename, false);



//Build the model

Model model = new CustomGaussianMixture(data.getAttributes());



//Learn the model

model.updateModel(data); 



//Update your model

for(int i=1; i<12; i++) {

filename = “dataFlink_month"+i+".arff";

data = DataFlinkLoader.loadDataFromFolder(env, filename,false);

model.updateModel(data);



}
RUNNING CODE EXAMPLE
AMIDST Toolbox – Flink Forward
RUNNING USE CASE
20
Predicting Defaulting Clients
§  Old BCC’s models based on logistic regression got an AUC of 0.816.
§  AMIDST’s models gets an AUC of 0.952.
AMIDST Toolbox – Flink Forward
SCALABILITY ANALYSIS
21
Scalability analysis
Use your defined models to process massive data sets in a
distributed computer cluster using Flink.
AMIDST Toolbox – Flink Forward
SCALABILITY ANALYSIS
22
One billion node probabilistic model
Experiment on a Flink cluster with 16 nodes on AWS.
AMIDST Toolbox – Flink Forward
SCALABILITY ANALYSIS
§  Speedup	(with	respect	to	2	nodes)	
AMIDST Toolbox – Flink Forward 23/32
0	
1	
2	
3	
4	
5	
6	
7	
8	
4	nodes	 8	nodes	 16	nodes
MODULAR DESIGN
24
Modular Design
The AMIDST Toolbox has been designed following a modular structure.
This makes easier:
§  The maintenance and enhancement of the software
§  The integration with external software: HUGIN, MOA, Weka, R.
AMIDST Toolbox – Flink Forward
MODULAR DESIGN
25AMIDST Toolbox – Flink Forward
26
Running
Use Case II
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
27
Tracking Concept Drift
Detects changes in customer profiles during Spanish financial crisis
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
28
Hidden Variables are used to capture changes in customer profile
MODEL
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
29
RUNNING CODE
AMIDST Toolbox – Flink Forward
//Set-up Flink session.

final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();



//Load the data stream

String filename = “hdfs://dataFlink_month0.arff";

DataFlink<DataInstance> data =
DataFlinkLoader.loadDataFromFolder(env, filename, false);



//Build the model

Model model = new ConceptDriftDetector(data.getAttributes());



//Learn the model

model.updateModel(data); 



//Update your model

for(int i=1; i<12; i++) {

filename = “dataFlink_month"+i+".arff";

data = DataFlinkLoader.loadDataFromFolder(env, filename,false);

model.updateModel(data);
System.out.println(model.getPosteriorDistribution(“hiddenVar”).
toString());

}
CONCEPT DRIFT DETECTION
30
Hidden Variable Captures Concept Drift
Drift Pattern: Seasonal + Global trend
RESULTS
AMIDST Toolbox – Flink Forward
CONCEPT DRIFT DETECTION
31
Unemployment Rate main driver of Concept Drift
Hidden Variable correlates with unemployment rate (rho = 0.961)
RESULTS
AMIDST Toolbox – Flink Forward
COLLABORATE
32
www.amidsttoolbox.com github.com/amidst/toolbox
Appache
License 2.0
AMIDST Toolbox – Flink Forward
AMIDST Toolbox – Flink Forward 33
Thanks for your
attention
			@	 contact@amidsttoolbox.com
@AmidstToolbox
			
www	 www.amidsttoolbox.com

More Related Content

Viewers also liked

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkFlink Forward
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsFlink Forward
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkFlink Forward
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Flink Forward
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkFlink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Flink Forward
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Flink Forward
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereFlink Forward
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Flink Forward
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateFlink Forward
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: AmadeusFlink Forward
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...Flink Forward
 
Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)AMIDST Toolbox
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkFlink Forward
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Flink Forward
 
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Flink Forward
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingFlink Forward
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriFlink Forward
 

Viewers also liked (20)

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security Enhancements
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs

 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 
Stephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink EverywhereStephan Ewen - Running Flink Everywhere
Stephan Ewen - Running Flink Everywhere
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: Amadeus
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...Thomas Lamirault_Mohamed Amine Abdessemed  -A brief history of time with Apac...
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
 
Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)
 
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache FlinkGábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
Gábor Horváth - Code Generation in Serializers and Comparators of Apache Flink
 
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink...
 
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
Matthias Kricke_Martin Grimmer_Michael Schmeißer - Building a real time Tweet...
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 

Similar to Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with Flink

Elastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application CloudElastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application CloudLucas Bremgartner
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormMike Lohmann
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringMohamed MEJDOUBI
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3edKhánh Ghẻ
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
 
OpenDaylight app development tutorial
OpenDaylight app development tutorialOpenDaylight app development tutorial
OpenDaylight app development tutorialSDN Hub
 
Projects on Cloud Computing
Projects on Cloud ComputingProjects on Cloud Computing
Projects on Cloud ComputingPhdtopiccom
 
What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1Precisely
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayJosef Adersberger
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherDavid La Motta
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingJaime Martin Losa
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionLudovico Caldara
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01AislanSoares
 
How to use source control with apex?
How to use source control with apex?How to use source control with apex?
How to use source control with apex?Oliver Lemm
 
Open Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit ParisOpen Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit Parisit-novum
 

Similar to Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with Flink (20)

Elastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application CloudElastic Stack @ Swisscom Application Cloud
Elastic Stack @ Swisscom Application Cloud
 
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with StormDECK36 - Log everything! and Realtime Datastream Analytics with Storm
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
 
Smartblitzmerker
SmartblitzmerkerSmartblitzmerker
Smartblitzmerker
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
 
Chapter9 network managment-3ed
Chapter9 network managment-3edChapter9 network managment-3ed
Chapter9 network managment-3ed
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
OpenDaylight app development tutorial
OpenDaylight app development tutorialOpenDaylight app development tutorial
OpenDaylight app development tutorial
 
Projects on Cloud Computing
Projects on Cloud ComputingProjects on Cloud Computing
Projects on Cloud Computing
 
What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1What’s New in Syncsort Ironstream 2.1
What’s New in Syncsort Ironstream 2.1
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack TogetherBringing NetApp Data ONTAP & Apache CloudStack Together
Bringing NetApp Data ONTAP & Apache CloudStack Together
 
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin MeetingDDS Advanced Tutorial - OMG June 2013 Berlin Meeting
DDS Advanced Tutorial - OMG June 2013 Berlin Meeting
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW version
 
E-GEN iCAN
E-GEN iCANE-GEN iCAN
E-GEN iCAN
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01Computernetworkingkurosech9 091011003335-phpapp01
Computernetworkingkurosech9 091011003335-phpapp01
 
How to use source control with apex?
How to use source control with apex?How to use source control with apex?
How to use source control with apex?
 
Open Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit ParisOpen Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit Paris
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with Flink

  • 1. AMIDST Toolbox – Flink Forward 1 A Java Toolbox for Scalable Probabilistic Machine Learning 12-14 SEP 2016, Berlin Ana M. Martínez Aalborg University ana@cs.aau.dk
  • 2. AMIDST Toolbox – Flink Forward 2 Who are we?
  • 3. THE AMIDST CONSORTIUM 3AMIDST Toolbox – Flink Forward
  • 5. RUNNING USE CASE 5 Predicting Defaulting Clients Predicts probability a customer will default within 2 years AMIDST Toolbox – Flink Forward
  • 6. RUNNING USE CASE §  Daily data for millions of clients §  Tons of missing data. §  Odd distributions. 6AMIDST Toolbox – Flink Forward
  • 9. AMIDST APPROACH 9 Data Knowledge Openbox Models Blackbox Inference Engine (Powered by Flink) AMIDST Toolbox – Flink Forward
  • 11. PGMS 11 Probabilistic graphical models (PGMs) Specify your model using probabilistic graphical models with latent variables and temporal dependencies AMIDST Toolbox – Flink Forward
  • 12. RUNNING USE CASE 12AMIDST Toolbox – Flink Forward
  • 13. PGMS 13 Custom Gaussian Mixture Model Hij defines a local mixture. Hi defines a global mixture. AMIDST Toolbox – Flink Forward
  • 14. PGMS 14 RUNNING CODE EXAMPLE //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = "hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new CustomGaussianMixture(data.getAttributes()); 
 
 AMIDST Toolbox – Flink Forward
  • 15. SCALABLE INFERENCE 15 Scalable Learning Perform Bayesian inference on your probabilistic models with powerful approximate and scalable algorithms. AMIDST Toolbox – Flink Forward
  • 16. SCALABLE INFERENCE 16 d-VMP Algorithm - Coded as iterative map-reduce task A state-of-the-art distributed variational message passing algorithm. AMIDST Toolbox – Flink Forward
  • 17. SCALABLE INFERENCE 17 RUNNING CODE EXAMPLE AMIDST Toolbox – Flink Forward //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = “hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new CustomGaussianMixture(data.getAttributes());
 
 //Learn the model
 model.updateModel(data); 
 

  • 18. DATA STREAMS 18 Data Streams Update your models when new data is available. This makes our toolbox appropriate for learning from data streams. AMIDST Toolbox – Flink Forward
  • 19. DATA STREAMS 19 //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = “hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new CustomGaussianMixture(data.getAttributes());
 
 //Learn the model
 model.updateModel(data); 
 
 //Update your model
 for(int i=1; i<12; i++) {
 filename = “dataFlink_month"+i+".arff";
 data = DataFlinkLoader.loadDataFromFolder(env, filename,false);
 model.updateModel(data);
 
 } RUNNING CODE EXAMPLE AMIDST Toolbox – Flink Forward
  • 20. RUNNING USE CASE 20 Predicting Defaulting Clients §  Old BCC’s models based on logistic regression got an AUC of 0.816. §  AMIDST’s models gets an AUC of 0.952. AMIDST Toolbox – Flink Forward
  • 21. SCALABILITY ANALYSIS 21 Scalability analysis Use your defined models to process massive data sets in a distributed computer cluster using Flink. AMIDST Toolbox – Flink Forward
  • 22. SCALABILITY ANALYSIS 22 One billion node probabilistic model Experiment on a Flink cluster with 16 nodes on AWS. AMIDST Toolbox – Flink Forward
  • 23. SCALABILITY ANALYSIS §  Speedup (with respect to 2 nodes) AMIDST Toolbox – Flink Forward 23/32 0 1 2 3 4 5 6 7 8 4 nodes 8 nodes 16 nodes
  • 24. MODULAR DESIGN 24 Modular Design The AMIDST Toolbox has been designed following a modular structure. This makes easier: §  The maintenance and enhancement of the software §  The integration with external software: HUGIN, MOA, Weka, R. AMIDST Toolbox – Flink Forward
  • 25. MODULAR DESIGN 25AMIDST Toolbox – Flink Forward
  • 26. 26 Running Use Case II AMIDST Toolbox – Flink Forward
  • 27. CONCEPT DRIFT DETECTION 27 Tracking Concept Drift Detects changes in customer profiles during Spanish financial crisis AMIDST Toolbox – Flink Forward
  • 28. CONCEPT DRIFT DETECTION 28 Hidden Variables are used to capture changes in customer profile MODEL AMIDST Toolbox – Flink Forward
  • 29. CONCEPT DRIFT DETECTION 29 RUNNING CODE AMIDST Toolbox – Flink Forward //Set-up Flink session.
 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 
 //Load the data stream
 String filename = “hdfs://dataFlink_month0.arff";
 DataFlink<DataInstance> data = DataFlinkLoader.loadDataFromFolder(env, filename, false);
 
 //Build the model
 Model model = new ConceptDriftDetector(data.getAttributes());
 
 //Learn the model
 model.updateModel(data); 
 
 //Update your model
 for(int i=1; i<12; i++) {
 filename = “dataFlink_month"+i+".arff";
 data = DataFlinkLoader.loadDataFromFolder(env, filename,false);
 model.updateModel(data); System.out.println(model.getPosteriorDistribution(“hiddenVar”). toString());
 }
  • 30. CONCEPT DRIFT DETECTION 30 Hidden Variable Captures Concept Drift Drift Pattern: Seasonal + Global trend RESULTS AMIDST Toolbox – Flink Forward
  • 31. CONCEPT DRIFT DETECTION 31 Unemployment Rate main driver of Concept Drift Hidden Variable correlates with unemployment rate (rho = 0.961) RESULTS AMIDST Toolbox – Flink Forward
  • 33. AMIDST Toolbox – Flink Forward 33 Thanks for your attention @ contact@amidsttoolbox.com @AmidstToolbox www www.amidsttoolbox.com