SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
End-to-End Platform Support for
Distributed Deep Learning in Finance
Jim Dowling
CEO, Logical Clocks AB
Assoc Prof, KTH Stockholm
Senior Researcher, RISE SICS
jim_dowling
Deep Learning in Finance
•Financial modelling problems are typically complex and
non-linear.
•If you’re lucky, you have lots of labelled data
-Deep learning models can learn non-linear relationships and
recurrent structures that generalize beyond the training data.
•Potential areas in finance: pricing, portfolio construction,
risk management and HFT*
2/33
* https://towardsdatascience.com/deep-learning-in-finance-9e088cb17c03
More Data means Better Predictions
Prediction
Performance
Traditional ML
Deep Neural Nets
Amount Labelled Data
Hand-crafted
can outperform
1980s1990s2000s 2010s 2020s?
3/33
Do we need more Compute?
“Methods that scale with computation
are the future of AI”*
- Rich Sutton (A Founding Father of Reinforcement Learning)
* https://www.youtube.com/watch?v=EeMCEQa85tw
4/33
Reduce DNN Training Time
In 2017, Facebook
reduced training
time on ImageNet
for a CNN from 2
weeks to 1 hour
by scaling out to
256 GPUs using
Ring-AllReduce on
Caffe2.
https://arxiv.org/abs/1706.02677
5/33
• Hyper-parameter optimization is
parallelizable
• Neural Architecture Search
(Google)
- 450 GPU / 7 days
- 900 TPU / 5 days
- New SOTA on CIFAR10 (2.13% top 1)
- New SOTA on ImageNet (3.8% top 5)
Reduce Experiment Time with Parallel Experiments
https://arxiv.org/abs/1802.01548
6/33
Training Time and ML Practitioner Productivity
7
•Distributed Deep Learning
-Interactive analysis!
-Instant gratification!
“My Model’s Training.”
Training
7/33
More Compute should mean Faster Training
Training
Performance
Single-Host
Distributed
Available Compute
20152016 2017 2018?
8/33
Distributed Training: Theory and Practice
9 9/33
Image from @hardmaru on Twitter.
Distributed Training Algorithms not all Equal
Training
Performance
Parameter Servers
AllReduce
Available Compute
10/33
Ring-AllReduce vs Parameter Server
GPU 0
GPU 1
GPU 2
GPU 3
send
send
send
send
recv
recv
recv
recv GPU 1 GPU 2 GPU 3 GPU 4
Param Server(s)
Network Bandwidth is the Bottleneck for Distributed Training
11/33
AllReduce outperforms Parameter Servers
12/33
*https://github.com/uber/horovod
16 servers with 4 P100 GPUs (64 GPUs) each connected by ROCE-capable 25 Gbit/s network
(synthetic data). Speed below is images processed per second.*
For Bigger Models, Parameter Servers don’t scale
Infiniband for Training to overcome N/W Bottleneck
RDMA/Infiniband
Read Input Files, Write Model Checkpoints to Network FS
Aggregate Gradients
Separate Gradient sharing/aggregation network traffic from I/O traffic.
13/33
Horovod on Hops
import horovod.tensorflow as hvd
def conv_model(feature, target, mode)
…..
def main(_):
hvd.init()
opt = hvd.DistributedOptimizer(opt)
if hvd.local_rank()==0:
hooks = [hvd.BroadcastGlobalVariablesHook(0), ..]
…..
else:
hooks = [hvd.BroadcastGlobalVariablesHook(0), ..]
…..
from hops import allreduce
allreduce.launch(spark, 'hdfs:///Projects/…/all_reduce.ipynb')
“Pure” TensorFlow code
14/33
Parallel Experiments
Parallel Experiments on Hops
def model_fn(learning_rate, dropout):
import tensorflow as tf
from hops import tensorboard, hdfs, devices
[TensorFlow Code here]
from hops import experiment
args_dict = {'learning_rate': [0.001, 0.005, 0.01],
'dropout': [0.5, 0.6]}
experiment.launch(spark, model_fn, args_dict)
Launch TF jobs in Spark Executors
17/33
Launches 6 Spark Executors with a different Hyperparameter
combinations. Each Executor can have 1-N GPUs.
Parallel Experiments Visualization on TensorBoard
18/33
Parallel Experiment Results Visualization
Lots of good GPUs > A few great GPUs
100 x Nvidia 1080Ti (DeepLearning11)
8 x Nvidia V100 (DGX-1)
VS
Both top (100 GPUs) and bottom (8 GPUs) cost the same: $150K (March 2018).
19/33
Share GPUs to Maximize Utilization
GPU Resource Management (Hops, Mesos)
20/33
4 GPUs on any host
10 GPUs on 1 host
100 GPUs on 10 hosts with ‘Infiniband’
20 GPUs on 2 hosts with ‘Infiniband_P100’
DeepLearning11 Server $15K (10 x 1080Ti)
21/33
Economics of GPUs and the Cloud
Time
GPU
Utilization
On-Premise GPU
Cloud
DeepLearning11 (10x1080Tis) will pay for itself in 11 weeks,
compared to using a p3.8xlarge in AWS
22/33
Distributed Deep Learning for Finance
•Platform for Hyperscale Data Science
•Controlled* access to datasets
*GDPR-compliance, Sarbanes-Oxley, etc
23/33
Hopsworks
Hops: Next Generation Hadoop*
16x
Throughput
FasterBigger
*https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi
37x
Number of files
Scale Challenge Winner (2017)
25
GPUs in
YARN
25/33
Hopsworks Data Platform
Develop Train Test Serve
MySQL Cluster
Hive
InfluxDB
ElasticSearch
KafkaProjects,Datasets,Users
HopsFS / YARN
Spark, Flink, Tensorflow
Jupyter, Zeppelin
Jobs, Kibana, Grafana
REST
API
Hopsworks
26/33
Proj-42
Projects sandbox Private Data
A Project is a Grouping of Users and Data
Proj-X
Shared TopicTopic /Projs/My/Data
Proj-AllCompanyDB
Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017
27/33
How are Projects used?
Engineering
Kafka Topic
FX Project
FX Topic
FX DB
FX Data Stream
Shared Interactive Analytics
FX team
28/33
Per-Project Python Envs with Conda
Python libraries are usable by Spark/Tensorflow
29/33
HopsFS
YARN
FeatureStore
Tensorflow
Serving
Public Cloud or On-Premise
Tensorboard
TensorFlow in Hopsworks
Experiments
Kafka
Hive
30/33
One Click Deployment of TensorFlow Models
31/33
Hops API
•Python/Java/Scala library
-Manage tensorboard, Load/save models in HDFS
-Horovod, TensorFlowOnSpark
-Parameter sweeps for parallel experiments
-Neural Architecture Search with Genetic Algorithms
-Secure Streaming Analytics with Kafka/Spark/Flink
• SSL/TLS certs, Avro Schema, Endpoints for Kafka/Hopsworks/etc
32/33
Deep Learning Hierarchy of Scale
DDL
AllReduce
on GPU Servers
DDL with GPU Servers
and Parameter Servers
Parallel Experiments on GPU Servers
Single GPU
Many GPUs on a Single GPU Server
33/33
Summary
•Distribution can make Deep Learning practitioners more
productive.
https://www.oreilly.com/ideas/distributed-tensorflow
•Hopsworks is a new Data Platform built on HopsFS with
first-class support for Python / Deep Learning / ML /
Strong Data Governance
The Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman
Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso,
Robin Andersson, August Bonds, Filotas Siskos, Mahmoud Hamed.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al
Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka,
Tobias Johansson , Roberto Bampi.
www.hops.io
@hopshadoop

Más contenido relacionado

Similar a End-to-End Platform for Distributed Deep Learning in Finance

Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Codemotion
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Clarisse Hedglin
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosysteminovex GmbH
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraJim Dowling
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsJim Dowling
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Codemotion
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLAdam Muise
 
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.iraminnezarat
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksDataBench
 

Similar a End-to-End Platform for Distributed Deep Learning in Finance (20)

Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018 Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Rome 2018
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Presentation1
Presentation1Presentation1
Presentation1
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Data Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-ÖkosystemData Science und Machine Learning im Kubernetes-Ökosystem
Data Science und Machine Learning im Kubernetes-Ökosystem
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Odsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on HopsOdsc workshop - Distributed Tensorflow on Hops
Odsc workshop - Distributed Tensorflow on Hops
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
 

Más de Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 

Más de Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

End-to-End Platform for Distributed Deep Learning in Finance

  • 1. End-to-End Platform Support for Distributed Deep Learning in Finance Jim Dowling CEO, Logical Clocks AB Assoc Prof, KTH Stockholm Senior Researcher, RISE SICS jim_dowling
  • 2. Deep Learning in Finance •Financial modelling problems are typically complex and non-linear. •If you’re lucky, you have lots of labelled data -Deep learning models can learn non-linear relationships and recurrent structures that generalize beyond the training data. •Potential areas in finance: pricing, portfolio construction, risk management and HFT* 2/33 * https://towardsdatascience.com/deep-learning-in-finance-9e088cb17c03
  • 3. More Data means Better Predictions Prediction Performance Traditional ML Deep Neural Nets Amount Labelled Data Hand-crafted can outperform 1980s1990s2000s 2010s 2020s? 3/33
  • 4. Do we need more Compute? “Methods that scale with computation are the future of AI”* - Rich Sutton (A Founding Father of Reinforcement Learning) * https://www.youtube.com/watch?v=EeMCEQa85tw 4/33
  • 5. Reduce DNN Training Time In 2017, Facebook reduced training time on ImageNet for a CNN from 2 weeks to 1 hour by scaling out to 256 GPUs using Ring-AllReduce on Caffe2. https://arxiv.org/abs/1706.02677 5/33
  • 6. • Hyper-parameter optimization is parallelizable • Neural Architecture Search (Google) - 450 GPU / 7 days - 900 TPU / 5 days - New SOTA on CIFAR10 (2.13% top 1) - New SOTA on ImageNet (3.8% top 5) Reduce Experiment Time with Parallel Experiments https://arxiv.org/abs/1802.01548 6/33
  • 7. Training Time and ML Practitioner Productivity 7 •Distributed Deep Learning -Interactive analysis! -Instant gratification! “My Model’s Training.” Training 7/33
  • 8. More Compute should mean Faster Training Training Performance Single-Host Distributed Available Compute 20152016 2017 2018? 8/33
  • 9. Distributed Training: Theory and Practice 9 9/33 Image from @hardmaru on Twitter.
  • 10. Distributed Training Algorithms not all Equal Training Performance Parameter Servers AllReduce Available Compute 10/33
  • 11. Ring-AllReduce vs Parameter Server GPU 0 GPU 1 GPU 2 GPU 3 send send send send recv recv recv recv GPU 1 GPU 2 GPU 3 GPU 4 Param Server(s) Network Bandwidth is the Bottleneck for Distributed Training 11/33
  • 12. AllReduce outperforms Parameter Servers 12/33 *https://github.com/uber/horovod 16 servers with 4 P100 GPUs (64 GPUs) each connected by ROCE-capable 25 Gbit/s network (synthetic data). Speed below is images processed per second.* For Bigger Models, Parameter Servers don’t scale
  • 13. Infiniband for Training to overcome N/W Bottleneck RDMA/Infiniband Read Input Files, Write Model Checkpoints to Network FS Aggregate Gradients Separate Gradient sharing/aggregation network traffic from I/O traffic. 13/33
  • 14. Horovod on Hops import horovod.tensorflow as hvd def conv_model(feature, target, mode) ….. def main(_): hvd.init() opt = hvd.DistributedOptimizer(opt) if hvd.local_rank()==0: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..] ….. else: hooks = [hvd.BroadcastGlobalVariablesHook(0), ..] ….. from hops import allreduce allreduce.launch(spark, 'hdfs:///Projects/…/all_reduce.ipynb') “Pure” TensorFlow code 14/33
  • 16. Parallel Experiments on Hops def model_fn(learning_rate, dropout): import tensorflow as tf from hops import tensorboard, hdfs, devices [TensorFlow Code here] from hops import experiment args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(spark, model_fn, args_dict) Launch TF jobs in Spark Executors 17/33 Launches 6 Spark Executors with a different Hyperparameter combinations. Each Executor can have 1-N GPUs.
  • 17. Parallel Experiments Visualization on TensorBoard 18/33 Parallel Experiment Results Visualization
  • 18. Lots of good GPUs > A few great GPUs 100 x Nvidia 1080Ti (DeepLearning11) 8 x Nvidia V100 (DGX-1) VS Both top (100 GPUs) and bottom (8 GPUs) cost the same: $150K (March 2018). 19/33
  • 19. Share GPUs to Maximize Utilization GPU Resource Management (Hops, Mesos) 20/33 4 GPUs on any host 10 GPUs on 1 host 100 GPUs on 10 hosts with ‘Infiniband’ 20 GPUs on 2 hosts with ‘Infiniband_P100’
  • 20. DeepLearning11 Server $15K (10 x 1080Ti) 21/33
  • 21. Economics of GPUs and the Cloud Time GPU Utilization On-Premise GPU Cloud DeepLearning11 (10x1080Tis) will pay for itself in 11 weeks, compared to using a p3.8xlarge in AWS 22/33
  • 22. Distributed Deep Learning for Finance •Platform for Hyperscale Data Science •Controlled* access to datasets *GDPR-compliance, Sarbanes-Oxley, etc 23/33
  • 24. Hops: Next Generation Hadoop* 16x Throughput FasterBigger *https://www.usenix.org/conference/fast17/technical-sessions/presentation/niazi 37x Number of files Scale Challenge Winner (2017) 25 GPUs in YARN 25/33
  • 25. Hopsworks Data Platform Develop Train Test Serve MySQL Cluster Hive InfluxDB ElasticSearch KafkaProjects,Datasets,Users HopsFS / YARN Spark, Flink, Tensorflow Jupyter, Zeppelin Jobs, Kibana, Grafana REST API Hopsworks 26/33
  • 26. Proj-42 Projects sandbox Private Data A Project is a Grouping of Users and Data Proj-X Shared TopicTopic /Projs/My/Data Proj-AllCompanyDB Ismail et al, Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata, ICDCS 2017 27/33
  • 27. How are Projects used? Engineering Kafka Topic FX Project FX Topic FX DB FX Data Stream Shared Interactive Analytics FX team 28/33
  • 28. Per-Project Python Envs with Conda Python libraries are usable by Spark/Tensorflow 29/33
  • 29. HopsFS YARN FeatureStore Tensorflow Serving Public Cloud or On-Premise Tensorboard TensorFlow in Hopsworks Experiments Kafka Hive 30/33
  • 30. One Click Deployment of TensorFlow Models 31/33
  • 31. Hops API •Python/Java/Scala library -Manage tensorboard, Load/save models in HDFS -Horovod, TensorFlowOnSpark -Parameter sweeps for parallel experiments -Neural Architecture Search with Genetic Algorithms -Secure Streaming Analytics with Kafka/Spark/Flink • SSL/TLS certs, Avro Schema, Endpoints for Kafka/Hopsworks/etc 32/33
  • 32. Deep Learning Hierarchy of Scale DDL AllReduce on GPU Servers DDL with GPU Servers and Parameter Servers Parallel Experiments on GPU Servers Single GPU Many GPUs on a Single GPU Server 33/33
  • 33. Summary •Distribution can make Deep Learning practitioners more productive. https://www.oreilly.com/ideas/distributed-tensorflow •Hopsworks is a new Data Platform built on HopsFS with first-class support for Python / Deep Learning / ML / Strong Data Governance
  • 34. The Team Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, August Bonds, Filotas Siskos, Mahmoud Hamed. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka, Tobias Johansson , Roberto Bampi. www.hops.io @hopshadoop