Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Project "Deep Water"

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 95 Anuncio

Project "Deep Water"

Descargar para leer sin conexión

This is my Deep Water talk for the TensorFlow Paris meetup.

Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment.

This is my Deep Water talk for the TensorFlow Paris meetup.

Deep Water is H2O's integration with multiple open source deep learning libraries such as TensorFlow, MXNet and Caffe. On top of the performance gains from GPU backends, Deep Water naturally inherits all H2O properties in scalability. ease of use and deployment.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Similares a Project "Deep Water" (20)

Más de Jo-fai Chow (13)

Anuncio

Más reciente (20)

Project "Deep Water"

  1. 1. Project “Deep Water” H2O’s Integration with TensorFlow Jo-fai (Joe) Chow Data Scientist joe@h2o.ai @matlabulous TensorFlow Paris Meetup 30th November, 2016
  2. 2. About Me • Civil (Water) Engineer • 2010 – 2015 • Consultant (UK) • Utilities • Asset Management • Constrained Optimization • Industrial PhD (UK) • Infrastructure Design Optimization • Machine Learning + Water Engineering • Discovered H2O in 2014 • Data Scientist • From 2015 • Virgin Media (UK) • Domino Data Lab (Silicon Valley) • H2O.ai (Silicon Valley) 2
  3. 3. Agenda • Introduction • About TensorFlow • TensorFlow Use Cases • About H2O.ai • Project Deep Water • Motivation • Benefits • H2O + TensorFlow Live Demo • Conclusions 3
  4. 4. About TensorFlow 4
  5. 5. About TensorFlow • Open source machine learning framework by Google • Python / C++ API • TensorBoard • Data Flow Graph Visualization • Multi CPU / GPU • v0.8+ distributed machines support • Multi devices support • desktop, server and Android devices • Image, audio and NLP applications • HUGE Community • Support for Spark, Windows … 5
  6. 6. TensorFlow Wrappers • TFLearn – Simplified interface • keras – TensorFlow + Theano • tensorflow.rb – Ruby wrapper • TensorFlow.jl – Julia wrapper • TensorFlow for R – R wrapper • … and many more! • See: github.com/jtoy/awesome- tensorflow 6
  7. 7. TensorFlow Use Cases Very Cool TensorFlow Use Cases Some of the cool things you can do with TensorFlow 7
  8. 8. 8 playground.tensorflow.org
  9. 9. Neural Style Transfer in TensorFlow • Neural Style • “… a technique to train a deep neural network to separate artistic style from image structure, and combine the style of one image with the structure of another” • Original Paper • “A Neural Algorithm of Artistic Style” • TensorFlow Implementation • [Link] 9
  10. 10. Sorting Cucumbers • Problem • Sorting cucumbers is a laborious process. • In a Japanese farm, the farmer’s wife can spend up to eight hours a day sorting cucumbers during peak harvesting period. • Solution • Farmer’s son (Makoto Koike) used TensorFlow, Arduino and Raspberry Pi to create an automatic cucumber sorting system. 10
  11. 11. Sorting Cucumbers • Classification Problem • Input: cucumber photos (side, top, bottom) • Output: one of nine classes • Google’s Blog Post [Link] • YouTube Video [Link] 11
  12. 12. 12
  13. 13. 13
  14. 14. 14
  15. 15. Of course there are more TensorFlow use cases The key message here is … 15
  16. 16. TensorFlow democratizes the power of deep learning 16
  17. 17. About H2O.ai What exactly is H2O? 17
  18. 18. Company Overview Founded 2011 Venture-backed, debuted in 2012 Products • H2O Open Source In-Memory AI Prediction Engine • Sparkling Water • Steam Mission Operationalize Data Science, and provide a platform for users to build beautiful data products Team 70 employees • Distributed Systems Engineers doing Machine Learning • World-class visualization designers Headquarters Mountain View, CA 18
  19. 19. Bring AI To Business Empower Transformation Financial Services, Insurance and Healthcare as Our Vertical Focus Community as Our Foundation H2O is an open source platform empowering business transformation
  20. 20. Users In Various Verticals Adore H2O Financial Insurance MarketingTelecom Healthcare 20
  21. 21. 21 www.h2o.ai/customers
  22. 22. Cloud Integration Big Data Ecosystem H2O.ai Makes A Difference as an AI Platform Open Source Flexible Interface Scalability and Performance GPU EnablementRapid Model Deployment Smart and Fast Algorithms H2O Flow • 100% open source • Highly portable models deployed in Java (POJO) and Model Object Optimized (MOJO) • Automated and streamlined scoring service deployment with Rest API • Distributed In-Memory Computing Platform • Distributed Algorithms • Fine-Grain MapReduce
  23. 23. 0 10000 20000 30000 40000 50000 60000 70000 1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16 # H2O Users H2O Community Growth Tremendous Momentum Globally 65,000+ users globally (Sept 2016) • 65,000+ users from ~8,000 companies in 140 countries. Top 5 from: Large User Circle * DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT 23 0 2000 4000 6000 8000 10000 1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16 # Companies Using H2O ~8,000+ companies (Sept 2016) +127% +60%
  24. 24. H2O Community Support 24 Google forum – h2osteam community.h2o.ai Please try
  25. 25. H2O for Kaggle Competitions 25
  26. 26. H2O for Academic Research 26 http://www.sciencedirect.com/science/article/pii/S0377221716308657 https://arxiv.org/abs/1509.01199
  27. 27. H2O democratizes artificial intelligence & big data science 27
  28. 28. Our Open Source Products 100% Open Source. Big Data Science for Everyone! 28
  29. 29. H2O.ai Offers AI Open Source Platform Product Suite to Operationalize Data Science with Visual Intelligence In-Memory, Distributed Machine Learning Algorithms with Speed and Accuracy H2O Integration with Spark. Best Machine Learning on Spark. 100% Open Source 29 Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products Operationalize and Streamline Model Building, Training and Deployment Automatically and Elastically State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O Deep Water
  30. 30. H2O.ai Offers AI Open Source Platform Product Suite to Operationalize Data Science with Visual Intelligence In-Memory, Distributed Machine Learning Algorithms with Speed and Accuracy H2O Integration with Spark. Best Machine Learning on Spark. 100% Open Source 30 Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products Operationalize and Streamline Model Building, Training and Deployment Automatically and Elastically State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O Deep Water
  31. 31. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 31
  32. 32. Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Algorithms Overview Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning 32
  33. 33. Distributed Algorithms • Foundation for In-Memory Distributed Algorithm Calculation - Distributed Data Frames and columnar compression • All algorithms are distributed in H2O: GBM, GLM, DRF, Deep Learning and more. Fine-grained map-reduce iterations. • Only enterprise-grade, open-source distributed algorithms in the market User Benefits Advantageous Foundation • “Out-of-box” functionalities for all algorithms (NO MORE SCRIPTING) and uniform interface across all languages: R, Python, Java • Designed for all sizes of data sets, especially large data • Highly optimized Java code for model exports • In-house expertise for all algorithms Parallel Parse into Distributed Rows Fine Grain Map Reduce Illustration: Scalable Distributed Histogram Calculation for GBM FoundationforDistributedAlgorithms 33
  34. 34. H2O Deep Learning in Action 34
  35. 35. H2O in Action Quick Demo (5 minutes) 35
  36. 36. Simple Demo – Iris 36
  37. 37. H2O + R 37 Please try
  38. 38. H2O Flow (Web) 38
  39. 39. H2O Flow 39
  40. 40. Key Learning Resources • Help Documentations • docs.h2o.ai • Meetups • bit.ly/h2o_meetups • YouTube Channel • bit.ly/h2o_youtube 40
  41. 41. H2O.ai Offers AI Open Source Platform Product Suite to Operationalize Data Science with Visual Intelligence In-Memory, Distributed Machine Learning Algorithms with Speed and Accuracy H2O Integration with Spark. Best Machine Learning on Spark. 100% Open Source 41 Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products Operationalize and Streamline Model Building, Training and Deployment Automatically and Elastically State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O Deep Water
  42. 42. Both TensorFlow and H2O are widely used 42
  43. 43. TensorFlow democratizes the power of deep learning. H2O democratizes artificial intelligence & big data science. There are other open source libraries like MXNet and Caffe too. Let’s have a party, this will be fun! 43
  44. 44. Deep Water Next-Gen Distributed Deep Learning with H2O H2O integrates with existing GPU backends for significant performance gains One Interface - GPU Enabled - Significant Performance Gains Inherits All H2O Properties in Scalability, Ease of Use and Deployment Recurrent Neural Networks enabling natural language processing, sequences, time series, and more Convolutional Neural Networks enabling Image, video, speech recognition Hybrid Neural Network Architectures enabling speech to text translation, image captioning, scene parsing and more Deep Water 44
  45. 45. Deep Water Architecture Node 1 Node N Scala Spark H2O Java Execution Engine TensorFlow/mxnet/Caffe C++ GPU CPU TensorFlow/mxnet/Caffe C++ GPU CPU RPC R/Py/Flow/Scala client REST API Web server H2O Java Execution Engine grpc/MPI/RDMA Scala Spark 45
  46. 46. 46 Using H2O Flow to train Deep Water Model
  47. 47. Same H2O R/Python Interface 47
  48. 48. H2O + TensorFlow Live Demo 48
  49. 49. Deep Water H2O + TensorFlow Demo • H2O + TensorFlow • Dataset – Cat/Dog/Mouse • TensorFlow as GPU backend • Train a LeNet (CNN) model • Interfaces • Python (Jupyter Notebook) • Web (H2O Flow) • Code and Data • github.com/h2oai/deepwater 49
  50. 50. Data – Cat/Dog/Mouse Images 50
  51. 51. Data – CSV 51
  52. 52. Available Networks in Deep Water • LeNet (This Demo) • AlexNet • VGGNet • Inception (GoogLeNet) • ResNet (Deep Residual Learning) • Build Your Own 52 ResNet
  53. 53. Deep Water in Action Quick Demo (5 minutes) 53
  54. 54. 54
  55. 55. 55
  56. 56. 56
  57. 57. 57
  58. 58. 58 Using GPU for training
  59. 59. 59
  60. 60. 60 Saving the custom network structure as a file
  61. 61. 61 Creating the custom network structure with size = 28x28 and channels = 3
  62. 62. 62 Specifying the custom network structure for training
  63. 63. H2O + TensorFlow Now try it with H2O Flow 63
  64. 64. Want to try Deep Water? 64 • Build it • github.com/h2oai/deepwater • Ubuntu 16.04 • CUDA 8 • cuDNN 5 • … • Pre-built Amazon Machine Images (AMIs) • Info to be confirmed
  65. 65. H2O.ai Offers AI Open Source Platform Product Suite to Operationalize Data Science with Visual Intelligence In-Memory, Distributed Machine Learning Algorithms with Speed and Accuracy H2O Integration with Spark. Best Machine Learning on Spark. 100% Open Source 65 Visual Intelligence and UX Framework For Data Interpretation and Story Telling on top of Beautiful Data Products Operationalize and Streamline Model Building, Training and Deployment Automatically and Elastically State-of-the-art Deep Learning on GPUs with TensorFlow, MXNet or Caffe with the ease of use of H2O Deep Water
  66. 66. Want to find out more about Sparkling Water and Steam? • R Addicts Paris • Tomorrow 6:30pm • Three H2O Talks: • Introduction to H2O • Demos: H2O + R + Steam • Auto Machine Learning using H2O • Sparkling Water 2.0 66
  67. 67. Conclusions 67
  68. 68. Project “Deep Water” • H2O + TensorFlow • a powerful combination of two widely used open source machine learning libraries. • All Goodies from H2O • inherits all H2O properties in scalability, ease of use and deployment. • Unified Interface • allows users to build, stack and deploy deep learning models from different libraries efficiently. 68 • 100% Open Source • the party will get bigger!
  69. 69. Deep Water Roadmap (Q4 2016) 69
  70. 70. H2O’s Mission 70 Making Machine Learning Accessible to Everyone Photo credit: Virgin Media
  71. 71. Deep Water – Current Contributors 71
  72. 72. Merci beaucoup! • Organizers & Sponsors • Jiqiong, Natalia & Renat • Dailymotion • Code, Slides & Documents • bit.ly/h2o_meetups • bit.ly/h2o_deepwater • docs.h2o.ai • Contact • joe@h2o.ai • @matlabulous • github.com/woobe 72
  73. 73. Extra Slides (Iris Demo) 73
  74. 74. 74
  75. 75. 75
  76. 76. 76
  77. 77. 77
  78. 78. 78
  79. 79. 79
  80. 80. 80
  81. 81. 81
  82. 82. 82
  83. 83. 83
  84. 84. Extra Slides (Deep Water Demo) 84
  85. 85. 85
  86. 86. 86
  87. 87. 87
  88. 88. 88
  89. 89. 89
  90. 90. 90
  91. 91. 91
  92. 92. 92
  93. 93. 93
  94. 94. 94
  95. 95. 95

Notas del editor

  • Open Source: 100% open source. Deep learning is in public domain and users should have full access to it in open source.
    Seamless integration with Big Data frameworks, such as Apache Spark. We have Sparkling Water. Also, H2O co-located in the company’s Hadoop cluster, so analysts can discover insights in the data without extracting it or taking samples
    Flexible interface: R, Python and Scala, Java
    Smart Algorithms: Deep Learning differs in the depth of features they support. Our support is the capabilities listed below save time and manual programming for the data scientist, which leads to be better models.
    • Automatic standardization on of the predictors • Automatic initialization of model parameters (weights & biases) • Automatic adaptive learning rates • Ability to specify manual learning rate with annealing and momentum • Automatic handling of categorical and missing data • Regularization on techniques to manage complexity • Automatic parameter optimization on with grid search • Early stopping based on validation on datasets • Automatic cross-validation
    Distributed Computing Platform
    Highly scalable
    Superior performance
    Rapid Model Deployment: POJO is very portable, and makes model deployment very fast. Steam has a streamlined scoring server deployment mechanism, which makes model deployment so much faster.
    GPU Enablement: Deep Learning algorithms can be so much faster on GPU’s than on CPU’s. This enablement will give H2O implementations a boost in performance over other implementations.
    Cloud integration: Amazon Web Services, Microsoft Azure, Google Cloud
  • Date # Users
    1-Jan-15 11000
    1-Jul-15 25000
    1-Jan-16 29000
    1-Oct-16 65858

    Date # Companies
    1-Jan-15 1800
    1-Jul-15 3810
    1-Jan-16 4890
    1-Oct-16 7803
  • Works on Hadoop & Spark
    Key capabilities
    Distributed in-memory algorithms
    Columnar compression
    Feature Engineering
    EDA
    Data Munging
    Supervised Learning Algorithms
    Unsupervised Learning Algorithms
    Model deployment via POJO

  • GBM authors from Stanford on Scientific Advisory Council

×