SlideShare una empresa de Scribd logo
1 de 44
Distributed Machine and
Deep Learning at Scale with
Apache Ignite
Yuriy Babak
Yuriy Babak
Head of ML/DL framework
development at GridGain
Apache Ignite committer
2019 © GridGain Systems
Agenda
● Introduction of distributed ML/DL
● Small math background
● Apache Ignite ML overview
● External integrations
2019 © GridGain Systems
Apache Ignite
Distributed ML
with Apache Ignite
2019 © GridGain Systems
Why do we need distributed ML?
● Data preprocessing
○ Big amount of data on different machines
○ High costs of ETL
● Model training
○ We cannot store all data on single server
○ Data irregularity
● Inference
○ Intensive data flow
○ Data co-located inference
○ Network latency
2019 © GridGain Systems
Why do we need distributed ML?
● Data preprocessing
○ Big amount of data on different machines
○ High costs of ETL
● Model training
○ We cannot store all data on single server
○ Data irregularity
● Inference
○ Intensive data flow
○ Data co-located inference
○ Network latency
2019 © GridGain Systems
Why do we need distributed ML?
● Data preprocessing
○ Big amount of data on different machines
○ High costs of ETL
● Model training
○ We cannot store all data on single server
○ Data irregularity
● Inference
○ Intensive data flow
○ Data co-located inference
○ Network latency
2019 © GridGain Systems
Common approach
Local calculations Result aggregationModel
Model update
2019 © GridGain Systems
Ignite partition-based distributed dataset
Partition Data Dataset Context Dataset Data
Upstream Cache Context Cache On-Heap
Learning Env
On-Heap
Durable Stateless Durable Recoverable
double[][] x = …
double[] y = ...
double[][] x = …
double[] y = ...
Partition Based Dataset StructuresSource Data
2019 © GridGain Systems
Let`s do the math!
2019 © GridGain Systems
Example: Gradient descent
• Model: Linear regression
f(x)
x
2019 © GridGain Systems
Example: Gradient Descent
2019 © GridGain Systems
• Model: Linear regression
• Loss function:
2019 © GridGain Systems
Example: Gradient Descent
2019 © GridGain Systems
• Model: Linear regression
• Loss function: MSE
• Gradient:
2019 © GridGain Systems
Example: Gradient Descent
2019 © GridGain Systems
• Model: Linear regression
• Loss function:
• Gradient:
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Loss function: min Gini coefficient
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist (what shall we do?)
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist
• What to do: sum histograms
2019 © GridGain Systems
=+
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist
• What to do:
– One can sum histograms
– Summarizing histograms - commutative and associative operation
– So you can count the histograms using map-reduce!
2019 © GridGain Systems
2019 © GridGain Systems
Example: Random Forest
• Model: Random forest
• Objective function: min Gini coefficient for each sub-tree
• Gradient: doesn’t exist
• What to do: approximation using histogram partitioning
• Code: RandomForestTrainer, RandomForestClassifierTrainer
2019 © GridGain Systems
Distributed ML
with Apache Ignite
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
2019 © GridGain Systems
2019 © GridGain Systems
Ignite ML API
Vectorizer<Integer, BinaryObject, String, Double> vectorizer =
new BinaryObjectVectorizer<Integer>(
"feature1",
“feature2”
).labeled("label");
LogisticRegressionSGDTrainer trainer = new LogisticRegressionSGDTrainer();
LogisticRegressionModel mdl = trainer.fit(ignite, dataCache, vectorizer);
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Фреймворк для ансамблей
• Интеграция с Tensorflow
– как источник данных
– как менеджмент кластера
2019 © GridGain Systems
2019 © GridGain Systems
Preprocessing and Pipelines
Vectorizer<Integer, Vector, Integer, Double> vectorizer
= new DummyVectorizer<Integer>(0, 3, 4, 5, 6, 8, 10).labeled(1);
PipelineMdl<Integer, Vector> mdl =
new Pipeline<Integer, Vector, Integer, Double>()
.addVectorizer(vectorizer)
.addPreprocessingTrainer(new EncoderTrainer<Integer, Vector>()
.withEncoderType(EncoderType.STRING_ENCODER)
.withEncodedFeature(1)
.withEncodedFeature(6))
.addPreprocessingTrainer(new ImputerTrainer<Integer, Vector>())
.addPreprocessingTrainer(new MinMaxScalerTrainer<Integer, Vector>())
.addPreprocessingTrainer(new NormalizationTrainer<Integer, Vector>()
.withP(1))
.addTrainer(new DecisionTreeClassificationTrainer(5, 0))
.fit(ignite, dataCache);
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Ensemble Framework
2019 © GridGain Systems
2019 © GridGain Systems
Bagging, Boosting and Stacking
DatasetTrainer<LogisticRegressionModel, Double> trainer =
new LogisticRegressionSGDTrainer(...)...;
BaggedTrainer<Double> baggedTrainer = TrainerTransformers.makeBagged(trainer,
// ensemble size, subsample ration, feature vector size, features subspace dim
7, 0.7, 2, 2,
new onMajorityPredictionsAggregator());
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Ensemble Framework
• Continuous learning
2019 © GridGain Systems
2019 © GridGain Systems
Online learning
SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer();
SVMLinearClassificationModel mdl1 = trainer.fit(ignite, dataCache1, vectorizer);
SVMLinearClassificationModel mdl2 = trainer.update(mdl1, ignite, dataCache2,
vectorizer);
2019 © GridGain Systems
What does Apache Ignite support?
• Classical ML
– classification
– regression
– clustering
• Preprocessing
– binarization, imputing
– normalization, scaling
– string encoding
• Pipelines
• Ensemble Framework
• Continuous learning
• SQL in-place inference
2019 © GridGain Systems
2019 © GridGain Systems
IMDB with built-in ML
IgniteModelStorageUtil.saveModel(ignite, model, “titanik_model_tree”);
QueryCursor<List<?>> cursor = cache.query(new SqlFieldsQuery("select " +
"survived as truth, " +
"predict('titanik_model_tree', pclass, age, sibsp, parch, fare, case
sex when 'male' then 1 else 0 end) as prediction " +
"from titanik_train"))
2019 © GridGain Systems
External integrations
2019 © GridGain Systems
Model import
2019 © GridGain Systems
Inference in Ignite ML
2019 © GridGain Systems
Request queue
Response queue
Inference
Service
Inference
Service
Inference
Gateway
2019 © GridGain Systems
Import / Inference API
AsyncModelBuilder mdlBuilder = new IgniteDistributedModelBuilder(ignite, 4, 4);
XGModelParser parser = new XGModelParser();
try (Model<NamedVector, Future<Double>> mdl = mdlBuilder.build(reader, parser);
...
double prediction = mdl.predict(testObj).get();
}
2019 © GridGain Systems
TensorFlow on Apache Ignite
• Ignite Dataset
• IGFS Plugin
• Distributed Training
• More info here
#UnifiedAnalytics #SparkAISummit
>>> import tensorflow as tf
>>> from tensorflow.contrib.ignite import IgniteDataset
>>>
>>> dataset = IgniteDataset(cache_name="SQL_PUBLIC_KITTEN_CACHE")
>>> iterator = dataset.make_one_shot_iterator()
>>> next_obj = iterator.get_next()
>>>
>>> with tf.Session() as sess:
>>> for _ in range(3):
>>> print(sess.run(next_obj))
{'key': 1, 'val': {'NAME': b'WARM KITTY'}}
{'key': 2, 'val': {'NAME': b'SOFT KITTY'}}
{'key': 3, 'val': {'NAME': b'LITTLE BALL OF FUR'}}
2019 © GridGain Systems
Apache Ignite with GridGain ML Python API
GridGain ML client library provides user applications the ability to work
with GridGain ML functionality using Py4J as an integration mechanism.
If you want to use ggml in your project, you may install it from PyPI:
$ pip install ggml
2019 © GridGain Systems
Apache Ignite with GridGain ML Python API
GridGain ML client library provides user applications the ability to work
with GridGain ML functionality using Py4J as an integration mechanism.
If you want to use ggml in your project, you may install it from PyPI:
$ pip install ggml
NB: available only for Apache Ignite master and for GG 8.8
2019 © GridGain Systems
Apache Ignite with GridGain ML Python API
from ggml.regression import LinearRegressionTrainer
trainer = LinearRegressionTrainer()
model = trainer.fit(x_train, y_train)
r2_score(y_test, model.predict(x_test))
2019 © GridGain Systems
Apache Ignite ML Tutorial
https://github.com/apache/ignite/
org.apache.ignite.examples.ml.tutorial
2019 © GridGain Systems
2019 © GridGain Systems
Distributed Machine and Deep Learning
at Scale with Apache Ignite
Links:
• http://ignite.apache.org/
• https://medium.com/tensorflow/tensorflow-on-apache-ignite-99f1fc60efeb
• https://github.com/gridgain/ml-python-api
• http://bit.ly/DataNativesOslo
Email:
● user@ignite.apache.org
● dev@ignite.apache.org
● ybabak@gridgain.com
2019 © GridGain Systems
Preprocessing and Pipelines

Más contenido relacionado

La actualidad más candente

Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Amazon Web Services
 
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaGoogle Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaEdureka!
 
Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference Hagay Lupesko
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBig Data Week
 
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0Aditya Yadav
 
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...Amazon Web Services
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applicationsKexin Xie
 
Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSAmazon Web Services
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the CloudAmazon Web Services
 
Optimize Your Amazon ECS Environment
Optimize Your Amazon ECS EnvironmentOptimize Your Amazon ECS Environment
Optimize Your Amazon ECS EnvironmentAmazon Web Services
 
Engineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleEngineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleAmazon Web Services
 
Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAwaretwilmes
 
Top 13 best security practices for Azure
Top 13 best security practices for AzureTop 13 best security practices for Azure
Top 13 best security practices for AzureRadu Vunvulea
 
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Amazon Web Services
 
PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni   PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni Neotys
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningDataWorks Summit/Hadoop Summit
 

La actualidad más candente (20)

Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | EdurekaGoogle Cloud Platform Tutorial | GCP Fundamentals | Edureka
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
 
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
AWS CodeStar 및 Cloud9을 통한 서버리스(Serverless) 앱 개발 길잡이 - 윤석찬 (AWS 테크에반젤리스트)
 
Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0Special Purpose Quantum Annealing Quantum Computer v1.0
Special Purpose Quantum Annealing Quantum Computer v1.0
 
Amazon Aurora 深度探討
Amazon Aurora 深度探討Amazon Aurora 深度探討
Amazon Aurora 深度探討
 
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWS
 
(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud(CMP202) Engineering Simulation and Analysis in the Cloud
(CMP202) Engineering Simulation and Analysis in the Cloud
 
Optimize Your Amazon ECS Environment
Optimize Your Amazon ECS EnvironmentOptimize Your Amazon ECS Environment
Optimize Your Amazon ECS Environment
 
Engineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleEngineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global Scale
 
Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAware
 
Top 13 best security practices for Azure
Top 13 best security practices for AzureTop 13 best security practices for Azure
Top 13 best security practices for Azure
 
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
Working with Scalable Machine Learning Algorithms in Amazon SageMaker - AWS O...
 
PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni   PAC 2019 virtual Stefano Doni
PAC 2019 virtual Stefano Doni
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 

Similar a Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with Apache Ignite" - Yuriy Babak

Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteDenis Magda
 
Python tutorial for ML
Python tutorial for MLPython tutorial for ML
Python tutorial for MLBin Han
 
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersDenis Magda
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity ManagementEDB
 
Deploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformDeploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformMariaDB plc
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Ippon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra applicationIppon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra applicationDataStax Academy
 
Multi criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationMulti criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationIppon
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)Jasjeet Thind
 
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodLin Yuan
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...HostedbyConfluent
 
Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalabilitymabuhr
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET Journal
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIMESlides
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 

Similar a Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with Apache Ignite" - Yuriy Babak (20)

Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Python tutorial for ML
Python tutorial for MLPython tutorial for ML
Python tutorial for ML
 
In-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software EngineersIn-Memory Computing Essentials for Software Engineers
In-Memory Computing Essentials for Software Engineers
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
Deploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud PlatformDeploying MariaDB for HA on Google Cloud Platform
Deploying MariaDB for HA on Google Cloud Platform
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Ippon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra applicationIppon Technologies: Multi-criteria queries on a Cassandra application
Ippon Technologies: Multi-criteria queries on a Cassandra application
 
Multi criteria queries on a cassandra application
Multi criteria queries on a cassandra applicationMulti criteria queries on a cassandra application
Multi criteria queries on a cassandra application
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation FrameworkMongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
MongoDB World 2019: Unleash the Power of the MongoDB Aggregation Framework
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...Comparing three data ingestion approaches where Apache Kafka integrates with ...
Comparing three data ingestion approaches where Apache Kafka integrates with ...
 
Using Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High ScalabilityUsing Grid Technologies in the Cloud for High Scalability
Using Grid Technologies in the Cloud for High Scalability
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
KNIME Data Science Learnathon: From Raw Data To Deployment - Paris - November...
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 

Más de Dataconomy Media

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Dataconomy Media
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Dataconomy Media
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...Dataconomy Media
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Dataconomy Media
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...Dataconomy Media
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Dataconomy Media
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Dataconomy Media
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Dataconomy Media
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Dataconomy Media
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Dataconomy Media
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Dataconomy Media
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Dataconomy Media
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Dataconomy Media
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
 
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas TomperiBig Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas TomperiDataconomy Media
 

Más de Dataconomy Media (20)

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & 	David An...
Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...
 
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...
 
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...
 
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...
 
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...Data Natives meets DataRobot |  "Build and deploy an anti-money laundering mo...
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...
 
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...
 
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...Data Natives Vienna v 7.0  | "Building Kubernetes Operators with KUDO for Dat...
Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...
 
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...
 
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...Data Natives Cologne v 4.0  | "The Data Lorax: Planting the Seeds of Fairness...
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...
 
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...
 
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...
 
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...
 
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...
 
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...
 
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...
 
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...
 
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...
 
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...
 
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...
 
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas TomperiBig Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
Big Data Helsinki v 3 | "What you should know about PSD2 APIs?" - Joonas Tomperi
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with Apache Ignite" - Yuriy Babak

  • 1. Distributed Machine and Deep Learning at Scale with Apache Ignite Yuriy Babak
  • 2. Yuriy Babak Head of ML/DL framework development at GridGain Apache Ignite committer
  • 3. 2019 © GridGain Systems Agenda ● Introduction of distributed ML/DL ● Small math background ● Apache Ignite ML overview ● External integrations
  • 4. 2019 © GridGain Systems Apache Ignite
  • 6. 2019 © GridGain Systems Why do we need distributed ML? ● Data preprocessing ○ Big amount of data on different machines ○ High costs of ETL ● Model training ○ We cannot store all data on single server ○ Data irregularity ● Inference ○ Intensive data flow ○ Data co-located inference ○ Network latency
  • 7. 2019 © GridGain Systems Why do we need distributed ML? ● Data preprocessing ○ Big amount of data on different machines ○ High costs of ETL ● Model training ○ We cannot store all data on single server ○ Data irregularity ● Inference ○ Intensive data flow ○ Data co-located inference ○ Network latency
  • 8. 2019 © GridGain Systems Why do we need distributed ML? ● Data preprocessing ○ Big amount of data on different machines ○ High costs of ETL ● Model training ○ We cannot store all data on single server ○ Data irregularity ● Inference ○ Intensive data flow ○ Data co-located inference ○ Network latency
  • 9. 2019 © GridGain Systems Common approach Local calculations Result aggregationModel Model update
  • 10. 2019 © GridGain Systems Ignite partition-based distributed dataset Partition Data Dataset Context Dataset Data Upstream Cache Context Cache On-Heap Learning Env On-Heap Durable Stateless Durable Recoverable double[][] x = … double[] y = ... double[][] x = … double[] y = ... Partition Based Dataset StructuresSource Data
  • 11. 2019 © GridGain Systems Let`s do the math!
  • 12. 2019 © GridGain Systems Example: Gradient descent • Model: Linear regression f(x) x
  • 13. 2019 © GridGain Systems Example: Gradient Descent 2019 © GridGain Systems • Model: Linear regression • Loss function:
  • 14. 2019 © GridGain Systems Example: Gradient Descent 2019 © GridGain Systems • Model: Linear regression • Loss function: MSE • Gradient:
  • 15. 2019 © GridGain Systems Example: Gradient Descent 2019 © GridGain Systems • Model: Linear regression • Loss function: • Gradient:
  • 16. 2019 © GridGain Systems Example: Random Forest • Model: Random forest 2019 © GridGain Systems
  • 17. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Loss function: min Gini coefficient 2019 © GridGain Systems
  • 18. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist (what shall we do?) 2019 © GridGain Systems
  • 19. 2019 © GridGain Systems Example: Random Forest 2019 © GridGain Systems
  • 20. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist • What to do: sum histograms 2019 © GridGain Systems =+
  • 21. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist • What to do: – One can sum histograms – Summarizing histograms - commutative and associative operation – So you can count the histograms using map-reduce! 2019 © GridGain Systems
  • 22. 2019 © GridGain Systems Example: Random Forest • Model: Random forest • Objective function: min Gini coefficient for each sub-tree • Gradient: doesn’t exist • What to do: approximation using histogram partitioning • Code: RandomForestTrainer, RandomForestClassifierTrainer 2019 © GridGain Systems
  • 24. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering 2019 © GridGain Systems
  • 25. 2019 © GridGain Systems Ignite ML API Vectorizer<Integer, BinaryObject, String, Double> vectorizer = new BinaryObjectVectorizer<Integer>( "feature1", “feature2” ).labeled("label"); LogisticRegressionSGDTrainer trainer = new LogisticRegressionSGDTrainer(); LogisticRegressionModel mdl = trainer.fit(ignite, dataCache, vectorizer);
  • 26. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Фреймворк для ансамблей • Интеграция с Tensorflow – как источник данных – как менеджмент кластера 2019 © GridGain Systems
  • 27. 2019 © GridGain Systems Preprocessing and Pipelines Vectorizer<Integer, Vector, Integer, Double> vectorizer = new DummyVectorizer<Integer>(0, 3, 4, 5, 6, 8, 10).labeled(1); PipelineMdl<Integer, Vector> mdl = new Pipeline<Integer, Vector, Integer, Double>() .addVectorizer(vectorizer) .addPreprocessingTrainer(new EncoderTrainer<Integer, Vector>() .withEncoderType(EncoderType.STRING_ENCODER) .withEncodedFeature(1) .withEncodedFeature(6)) .addPreprocessingTrainer(new ImputerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new MinMaxScalerTrainer<Integer, Vector>()) .addPreprocessingTrainer(new NormalizationTrainer<Integer, Vector>() .withP(1)) .addTrainer(new DecisionTreeClassificationTrainer(5, 0)) .fit(ignite, dataCache);
  • 28. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Ensemble Framework 2019 © GridGain Systems
  • 29. 2019 © GridGain Systems Bagging, Boosting and Stacking DatasetTrainer<LogisticRegressionModel, Double> trainer = new LogisticRegressionSGDTrainer(...)...; BaggedTrainer<Double> baggedTrainer = TrainerTransformers.makeBagged(trainer, // ensemble size, subsample ration, feature vector size, features subspace dim 7, 0.7, 2, 2, new onMajorityPredictionsAggregator());
  • 30. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Ensemble Framework • Continuous learning 2019 © GridGain Systems
  • 31. 2019 © GridGain Systems Online learning SVMLinearClassificationTrainer trainer = new SVMLinearClassificationTrainer(); SVMLinearClassificationModel mdl1 = trainer.fit(ignite, dataCache1, vectorizer); SVMLinearClassificationModel mdl2 = trainer.update(mdl1, ignite, dataCache2, vectorizer);
  • 32. 2019 © GridGain Systems What does Apache Ignite support? • Classical ML – classification – regression – clustering • Preprocessing – binarization, imputing – normalization, scaling – string encoding • Pipelines • Ensemble Framework • Continuous learning • SQL in-place inference 2019 © GridGain Systems
  • 33. 2019 © GridGain Systems IMDB with built-in ML IgniteModelStorageUtil.saveModel(ignite, model, “titanik_model_tree”); QueryCursor<List<?>> cursor = cache.query(new SqlFieldsQuery("select " + "survived as truth, " + "predict('titanik_model_tree', pclass, age, sibsp, parch, fare, case sex when 'male' then 1 else 0 end) as prediction " + "from titanik_train")) 2019 © GridGain Systems
  • 35. 2019 © GridGain Systems Model import
  • 36. 2019 © GridGain Systems Inference in Ignite ML 2019 © GridGain Systems Request queue Response queue Inference Service Inference Service Inference Gateway
  • 37. 2019 © GridGain Systems Import / Inference API AsyncModelBuilder mdlBuilder = new IgniteDistributedModelBuilder(ignite, 4, 4); XGModelParser parser = new XGModelParser(); try (Model<NamedVector, Future<Double>> mdl = mdlBuilder.build(reader, parser); ... double prediction = mdl.predict(testObj).get(); }
  • 38. 2019 © GridGain Systems TensorFlow on Apache Ignite • Ignite Dataset • IGFS Plugin • Distributed Training • More info here #UnifiedAnalytics #SparkAISummit >>> import tensorflow as tf >>> from tensorflow.contrib.ignite import IgniteDataset >>> >>> dataset = IgniteDataset(cache_name="SQL_PUBLIC_KITTEN_CACHE") >>> iterator = dataset.make_one_shot_iterator() >>> next_obj = iterator.get_next() >>> >>> with tf.Session() as sess: >>> for _ in range(3): >>> print(sess.run(next_obj)) {'key': 1, 'val': {'NAME': b'WARM KITTY'}} {'key': 2, 'val': {'NAME': b'SOFT KITTY'}} {'key': 3, 'val': {'NAME': b'LITTLE BALL OF FUR'}}
  • 39. 2019 © GridGain Systems Apache Ignite with GridGain ML Python API GridGain ML client library provides user applications the ability to work with GridGain ML functionality using Py4J as an integration mechanism. If you want to use ggml in your project, you may install it from PyPI: $ pip install ggml
  • 40. 2019 © GridGain Systems Apache Ignite with GridGain ML Python API GridGain ML client library provides user applications the ability to work with GridGain ML functionality using Py4J as an integration mechanism. If you want to use ggml in your project, you may install it from PyPI: $ pip install ggml NB: available only for Apache Ignite master and for GG 8.8
  • 41. 2019 © GridGain Systems Apache Ignite with GridGain ML Python API from ggml.regression import LinearRegressionTrainer trainer = LinearRegressionTrainer() model = trainer.fit(x_train, y_train) r2_score(y_test, model.predict(x_test))
  • 42. 2019 © GridGain Systems Apache Ignite ML Tutorial https://github.com/apache/ignite/ org.apache.ignite.examples.ml.tutorial 2019 © GridGain Systems
  • 43. 2019 © GridGain Systems Distributed Machine and Deep Learning at Scale with Apache Ignite Links: • http://ignite.apache.org/ • https://medium.com/tensorflow/tensorflow-on-apache-ignite-99f1fc60efeb • https://github.com/gridgain/ml-python-api • http://bit.ly/DataNativesOslo Email: ● user@ignite.apache.org ● dev@ignite.apache.org ● ybabak@gridgain.com
  • 44. 2019 © GridGain Systems Preprocessing and Pipelines