Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Innovation with ai at scale on the edge vt sept 2019 v0

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 46 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Innovation with ai at scale on the edge vt sept 2019 v0 (20)

Anuncio

Más de Ganesan Narayanasamy (20)

Más reciente (20)

Anuncio

Innovation with ai at scale on the edge vt sept 2019 v0

  1. 1. Innovating with AI at Scale: Tools and Tips for Training and Inference Presenter: Clarisse Taaffe-Hedglin clarisse@us.ibm.com Executive AI Architect IBM Systems
  2. 2. 1. Drivers of the AI explosion 2. Implementing use cases at scale 3. Deploying models to the edge 2
  3. 3. Why AI models now? 3
  4. 4. 4 USE CASES ARE EVERYWHERE IBM Skills Academy / © Copyright 2018 IBM Corporation
  5. 5. Artificial Intelligence brings new Cognitive Capabilities • Computers can be trained to “See” Example: Airport security inspecting luggage • Computers can be trained to “Hear” Example: Maintenance crew listening to railcars • Computers can be trained to “do”: mimic an expert Example: Mobile phone provider predicting customer churn
  6. 6. Data + Algorithms + Compute CPU GPU FPGA The key triggers rapidly advancing AI Open Source Software
  7. 7. MEDIA/ENTERTAINMENT RETAIL Reco. Engines, Precision Mktg OTHERS Agriculture, Remote Sensing LIFE SCIENCES Sequence Analysis, Radiology UTILITIES Smart Meter analysis, Capacity planning $ FINANCIAL SERVICES Risk analysis Fraud detection CUSTOMER SERVICE Chatbots, Helpdesk Automated Expenses LAW & DEFENSE Threat analysis - social media monitoring RESEARCH Physics Modeling HEALTH CARE Patient sensors, monitoring, EHRs TRANSPORTATION Optimal traffic flows, Route planning CONSUMER GOODS Sentiment analysis Advertising effectiveness OIL & GAS Exploration, sensor analysis AUTOMOTIVE ADAS, Maintenance MANUFACTURING Line inspection, Defect analysis Addressable market Cognitive Systems / February 26 / © 2019 IBM Corporation
  8. 8. BIG, COMPLEX SYSTEMS PERSONALIZATION AUTOMATION SIMULATING RELATIONSHIPS VISUAL RECOGNITION PATTERNS The scenariosAI can best solve for today IBM Skills Academy / © Copyright 2018 IBM Corporation
  9. 9. ML Framework Landscape 9 Which ML frameworks have you used the most over the last 5 years? Source: Kaggle Data Science Survey 2018 scikit-learn is, by far, the most widely-used ML framework Why? • Wide variety of ML models • Good documentation • Standardized API Some downsides of scikit-learn are: 1. Lack of support of deep learning (DL) 2. Slow performance for large datasets Problem (1) is addressed by DL frame works in PowerAI (TensorFlow, PyTorch) recently rebranded as Watson Machine Learning Accelerator Problem (2) is addressed by Snap ML
  10. 10. Watson Machine Learning Community Edition TensorFlow TensorFlow Probability TensorBoard TensorFlow-Keras BVLC Caffe IBM Enhanced Caffe Caffe2 OpenBLAS HDF5 Curated, tested and pre-compiled binary software distribution that enables enterprises to quickly and easily deploy deep learning for their data science and analytics development Including all of the following frameworks: Nvidia RAPIDS
  11. 11. Distributed Deep Learning Simplifies the process of training deep learning models across a cluster for faster time to results. Software Libraries WML CE software and the accelerated Power servers support a host of accelerator libraries like SnapML, Nvidia RAPIDS Large Model Support Use system memory with GPUs to support more complex models and higher resolution data. IBM adds value to curated, tested, and pre-compiled frameworks with Watson Machine Learning Community Edition GPU CPU
  12. 12. Evolving from compute systems to Cognitive Systems P8 P9 P10 Open Frameworks Partnerships Industry Alignment DevEcosystem Accelerator Roadmaps Open Accelerator Interfaces Not Just About Hardware Design hardware software + It’s about co-optimization and open innovation which just work for ML, DL, and AI IBM Software 12
  13. 13. How to get to AI at scale ? 13
  14. 14. 14
  15. 15. Top 5 Error Rate
  16. 16. Distributed Deep Learning (DDL) 16Think 2018 / DOC ID / Month XX, 2018 / © 2018 IBM Corporation Deep learning training takes days to weeks Limited scaling to multiple x86 servers PowerAI with DDL enables scaling to 100s of GPUs 1 System 64 Systems 16 Days Down to 7 Hours 58x Faster 16 Days 7 Hours Near Ideal Scaling to 256 GPUs ResNet-101, ImageNet-22K 1 2 4 8 16 32 64 128 256 4 16 64 256 Speedup Number of GPUs Ideal Scaling 95%Scaling with 256 GPUS Caffe with PowerAI DDL, Running on Minsky (S822Lc) Power System ResNet-50, ImageNet-1K
  17. 17. 17
  18. 18. Train larger more complex models Large Model SupportTraditional Model Support Limited memory on GPU forces tradeoff in model size / data resolution Use system memory and GPU to support more complex and higher resolution data CPUDDR4 GPU PCIe Graphics Memory System Bottleneck Here POWER CPU DDR4 GPU NVLink Graphics Memory POWER NVLink Data Pipe
  19. 19. Large AI Models Train ~4 Times Faster POWER9 Servers with NVLink to GPUs vs x86 Servers with PCIe to GPUs 19 3.1 Hours 49 Mins 0 2000 4000 6000 8000 10000 12000 Xeon x86 2640v4 w/ 4x V100 GPUs Power AC922 w/ 4x V100 GPUs Time(secs) Caffe with LMS (Large Model Support) Runtime of 1000 Iterations 3.8x Faster GoogleNet model on Enlarged ImageNet Dataset (2240x2240)
  20. 20. TensorFlow Large Model Support NVLINK2 Advantage s. 3DUnet segmentation models with higher resolution images allows for learning and labeling finer details and structures of brain tumors. https://developer.ibm.com/linuxonpower/2018/07/27/tensorflow-large-model-support-case-study-3d-image-segmentation/
  21. 21. Accelerating Machine Learning Why Fast? Speed is important/crucial in many cases: • online re-training of models • model selection and hyper-parameter tuning • fast adaptability to changes Why Large-Scale? Large datasets arise in numerous business-critical applications: recommendation, credit fraud, advertising, space exploration, weather, etc. Why Resource-Savvy? Not everyone can afford on-prem computing. Renting computing in the cloud is billed by usage. Less usage means savings, higher profit margin. Snap ML is a framework for training Machine Learning (ML) Models It is characterized by:  high performance  scalability to very large datasets  high resource efficiency Artificial Intelligence Machine Learning Deep Learning (Neural Networks) 21
  22. 22. Which models are supported? 22 Snap ML (PowerAI 1.6.0) currently supports: • Generalized Linear Models: - Logistic Regression - Ridge Regression - Lasso Regression - Support Vector Machines (SVMs) • Tree-based models: - Decision Trees - Random Forest With more to come… Source: Kaggle Data Science Survey 2017 Which data science methods are used at work? Supported by Snap ML
  23. 23. 23 Decision Tree Performance Results Random Forest Performance Results 23 5.2x 4.5x On average 6.5x faster than sklearn (CPU-only) On average 3.8x faster than sklearn (CPU-only) Project www: https://www.zurich.ibm.com/snapml/ Core publication: https://arxiv.org/abs/1803.06333
  24. 24. Nvidia RAPIDS RAPIDS is a set of open source libraries for GPU accelerating data preparation and machine learning. OSS website: rapids.ai
  25. 25. Nvidia RAPIDS cuDF - GPU DataFrames is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data provides a pandas-like API that will be familiar to data engineers & data scientists Current version is 0.6  PowerAI 1.6.0 CuDF included tech preview version is backlevel (0.2)  WIP to get latest into Conda or build yourself (open source) Examples of data manipulation in cuDF like object creation, viewing, selection, merge, concat, etc can be found here: https://rapidsai.github.io/projects/cudf/en/latest/10min.html
  26. 26. Simple cuDF example download a CSV, then uses the GPU to parse it into rows and columns and run calculations: output:
  27. 27. Nvidia RAPIDS cuML - GPU Machine Learning is a suite of libraries that implement machine learning algorithms and mathematical primitives functions enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs Current version is 0.6  PowerAI 1.6.0 CuML included tech preview version is backlevel (0.2)  WIP to get latest into Conda or build yourself (open source) Documentation on supported algorithms like Kmeans, tSVD, PCA, DBSCAN can be found here: https://docs.rapids.ai/api/cuml/stable/
  28. 28. Simple cuML example loads input and computes DBSCAN clusters, all on GPU: output:
  29. 29. 29 How to deploy at the edge?
  30. 30. COLLECT - Make data simple and accessible ORGANIZE - Create a trusted analytics foundation ANALYZE - Scale AI everywhere with trust & transparency Data of every type, regardless of where it lives MODERNIZE your data estate for an AI and multicloud world INFUSE – Operationalize AI across business processes The AI Ladder A prescriptive approach to accelerating the journey to AI 30 AI AI-optimized systems infrastructure
  31. 31. AI Open Source Frameworks
  32. 32. Introduction to Nvidia TensorRT NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. Nvidia website: https://developer.nvidia.com/tensorrt
  33. 33. Tensorflow and TensorRT inference TensorFlow™ integration with TensorRT™ (TF-TRT) optimizes and executes compatible subgraphs, allowing TensorFlow to execute the remaining graph. While you can still use TensorFlow's wide and flexible feature set, TensorRT will parse the model and apply optimizations to the portions of the graph wherever possible.
  34. 34. Note: TensorRT engines are optimized for the currently available GPUs, so conversions should take place on the machine that will be running inference.
  35. 35. Calibrating for lower precision with a minimal loss of accuracy reduces the requirements on bandwidth and allows for faster computation speed. It also allows for the use of Tensor Cores, which perform matrix multiplication on 4×4 FP16 matrices and adds a 4×4 FP16 or FP32 matrix.
  36. 36. https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow- inference/
  37. 37. Nvidia TensorRT Current Version Version 6 Announced on September 16th (current) https://news.developer.nvidia.com/tensorrt6-breaks-bert-record/ Version 5.1.3.6 added as a tech preview to WML CE 1.6.1 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
  38. 38. 41 Resources https://developer.ibm.com/linuxonpower/deep-learning-powerai#tab_education Nvidia TensorRT: https://developer.nvidia.com/tensorrt WML CE 1.6.1: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ TF-TRT Documentation: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/ IBM TensorRT introduction blog: https://developer.ibm.com/linuxonpower/2019/07/29/introducing-tensorflow-with-tensorrt-tf-trt/ IBM Tensorflow Serving blog (includes TensorRT example): https://developer.ibm.com/linuxonpower/2019/08/05/using-tensorrt-models- with-tensorflow-serving-on-wml-ce/ Image classification and object detection: github.com/tensorflow/tensorrt Nvidia forum:https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/ Mixed precision and accuracy: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9143-mixed-precision- training-of-deep-neural-networks.pdf Demo: https://github.com/cheeyauk/tf_to_tensorrt
  39. 39. IBM Systems WW Client Experience Centers IBM Internal Use Only Search Center Offerings in ISCEP: https://ibm.biz/client-experience-portal Contact Center via IBM Systems Worldwide Client Experience Centers maximize IBM Systems competitive advantage in the Cloud and Cognitive era by providing access to world class technical experts and infrastructure services to assist Clients with the transformation of their IT implementations. Center offerings enable IBM Sellers and Business Partners to progress and expedite System Sales opportunities. 9 Worldwide Locations (* also Infrastructure Hubs): Austin TX , *Poughkeepsie NY, Rochester MN, Tucson AZ, *Beijing CHINA, Boeblingen GERMANY, Guadalajara MEXICO,*Montpellier FRANCE, Tokyo JAPAN Client Experience Tailored, in-depth technology Innovation Exchange Events Relationship building Demonstrations Meetups Solution workshops Remote options (Inbound & Outbound) Infrastructure Solutions Benchmarks, MVP & Proof of Technology “Test Drives” Demonstrations Infrastructure Services Certify ISV solutions Hosting Cloud Environment (Inbound to Centers) Architecture & Design Advise clients, Enable Sellers, “Art of the Possible” Discovery & Design Workshops, Consulting, Showcases, Reference Architectures, Co- Creation of assets Included CSSC (Inbound & Outbound) Content Content Development IBM Redbooks Training Courses Video courses “Test Drives” Demonstrations NEW: Co-Creation Lab; CEC Cloud; IBM Systems Center of Competency for Red Hat
  40. 40. Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 44
  41. 41. Notices and disclaimers 45Replace the footer with text from the PPT-Updater. Instructions are included in that file. © 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law.
  42. 42. Notices and disclaimers continued 46Replace the footer with text from the PPT-Updater. Instructions are included in that file. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose. The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml. .

Notas del editor

  • So what is triggering the rapid advancements in AI? It comes from major innovation in three critical categories:
    1) Digitization of society is creating an abundance of interesting datasets. Inside and outside the enterprise. And that continues to grow about 40% per year.
    2) Algorithm innovation in supervised & unsupervised learning techniques. Especially Deep Learning. Most of which is advancing in open source.
    3) Ability to run those algorithms on distributed compute and especially on GPUs.

    So together, the developments here have allowed us to employ AI on any problem where a human can get a task done in less than a 1 second of thought . It’s in this scope of problems where AI is being applied and it’s being wielded to create an flywheel: Data -> Products -> Users. Which is why competing on algorithms alone is not a defensible model.

    REFERENCE NOTES:
    Top trends:
    99% of commercial value associated with A->B: 0s or 1s. This is called supervised learning.
    Speech Recognition: Audio -> Text
    Image Recognition

    Types of Deep Learning:
    Supervised Learning: Learn from labeled datasets. Most economic value is here and drops off quickly through below.
    Transfer Learning: Learn about one topic. Apply to another domain.
    Unsupervised Learning. Learning without labeled data
    Reinforcement Learning.

    The rise of the internet via analogy:
    Shopping mall + internet doesn’t make an internet/ecommerce company
    What defines whether you are truly an internet company? A) architect the organizational design to take advantage of the internet. For instance, A/B tests, short cycle times, push decision making down to PM/dev,

    The rise of the AI era:
    Traditional tech company + deep learning doesn’t make it an AI company.
    Although only some patterns exist, Google & Baidu are good examples.
    Other patterns: a) strategic data acquisition, b) unified data ‘warehouse’, c) persuasive automation, d) new job descriptions.

    Building an AI company, centrally build an AI group and matrix them into your AI.
  • When working with clients, these are the top AI scenarios to look for as you explore their potential AI use cases.
  • The genesis of IBM PowerAI (now known as Watson Machine Learning Community Edition - WML CE) was to make it simple for data scientists to be more productive, more quickly, by greatly simplifying the tasks necessary to get up and running. WML CE is an enterprise software distribution that combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers to take your deep learning projects to the next level.

    For a fee, IBM offers formal support for WML CE components as long as their versions are consistent with the release configuration (NOTE that WML CE is a no charge offering but we do offer support for a fee). If you choose to use a different version of any of the components, no formal support will be available. However, in keeping with industry norms, specific questions can be posted on the WML CE space on DeveloperWorks Answers: https://developer.ibm.com/answers/topics/powerai/. This forum is monitored by the IBM technical team and technical support is provided on a best effort basis.

    There a several ways for you to get WML CE.
    Order it. WML CE is available as a no charge orderable part number from IBM (called PowerAI until 2H2019).
    Download it from here: http://ibm.biz/download-powerai
    Get the Docker container from here: https://hub.docker.com/r/ibmcom/powerai/

    As of WML CE (PowerAI) 1.5.4, the following frameworks are included in WML CE:
    (Make sure to check the Knowledge Center for the latest versions as they change rapidly
    https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_software_pkgs.html):
    DDL 1.2.0 - Distributed Deep Learning (with support for up to 4 nodes in WML CE)
    TensorFlow 1.12.0
    Tensorflow Probability 0.5.0 - TensorFlow Probability is a library for probabilistic reasoning and statistical analysis.
    TensorBoard 1.12.0 - a suite of visualization tools for TensorFlow
    TensorFlow Keras – NOTE that Keras is supported as part of the TensorFlow core library and as such we can support Keras through TensorFlow
    IBM enhanced Caffe 1.0.0
    BVLC Caffe 1.0.0 - The Berkeley Vision and Learning Center (BVLC)
    Caffe2 1.0rc1 – in technology preview
    PyTorch 1.0rc1
    Snap ML 1.0.0
    Spectrum MPI 10.2
    Bazel 0.15.0
    OpenBLAS 0.3.3
    HDF5 1.10.1
    Protobuf 3.6.1
    ONNX 1.3.0 – in technology preview

  • There are three additional capabilities on top of the open source frameworks (and in addition to the performance advantage that Power brings to the table); Large Model Support (LMS), Distributed Deep Learning (DDL), and support by IBM.

    Large Model Support
    WML CE addresses a fundamental limitation for deep learning; the size of memory available within GPUs. When training complex models or training with high definition images, the memory available on a GPU can be prohibitively restrictive. Instead of being forced into less complex, shallower deep learning models, customers can develop more accurate models with Large Model Support.

    With Large Model Support, enabled by IBM’s unique NVLink connection between CPU (memory) and GPU, the entire model and dataset can be loaded in to system memory and cached down to the GPU for action. Customers can now address bigger challenges and get much more work done within a cluster of WML CE servers increasing organizational efficiency. We will cover more details on LMS later in this deck.

    Distributed Deep Learning
    To accelerate the time dedicated to training a model, the WML CE stack includes function for distributing a single training job across a cluster of servers. IBM’s Distributed Deep Learning brings intelligence about the structure and layout of the underlying hardware cluster (topology). The impact of this is significant! WML CE and WML-A with Distributed Deep Learning can scale jobs across large numbers of cluster resources with very little loss due to communications overhead. There will be more details later in the presentation. WML CE allows for the use of DDL with up to a 4 node cluster. If a client wants to scale beyond 4 nodes, they must purchase WML-A.

    Supported by IBM
    Although WML CE is available free to download and use, IBM also provides a “for fee” support offering for those clients that want enterprise level support for the features and capabilities within the base offering.
  • We normally would focus on the HW optimization starting with the processor, the IO interfaces enabled by this processor and then what accelerators we would align to those interfaces for the optimal performance. And we are doing that today, however, it is not just about the HW. As I mentioned on the previous slide, we co-optimized the SW. We took the opensource deep learning frameworks and optimized them around this advanced design, added enhancements such as spark conductor for DDL and large model support while supporting everything from the HW to the SW in the solution. Not only do we have differentiated HW in AC922 with many industry only innovation, but we have a full SW offering on top of it that is equally rich of differentiated innovation and innovations only found with Power Systems.
  • It’s estimated that 1.2 trillion photos will be taken in 2017. Even if each photo only took someone 1 second to organize, tag and annotate, it would still take over 38,000 years to classify them all!

    There is a competition every year, known as ImageNet.
    Roughly 500,000 images (low resolution) and 200 categories for which to classify them.
  • We talked about this earlier – it’s all about maximizing accuracy (or minimizing error/loss)
    One way to get more accurate models is to simply add more layers
    The more layers the more complex, and the more difficult (computationaly) it becomes to train
  • Distributed deep learning (DDL) is IBM’s high performance approach to training single models across an entire cluster of compute nodes. Unlike native model parallelism (such as Google’s gRPC method for tensorflow), or Spark based approaches, the DDL library distributes model, training data set, and parameter serving across the defined cluster and it uses a novel algorithm to improve communication over very low latency fabric.

    The result is extremely efficient performance scaling, losing less than 5% of ideal efficiency when moving from 4 GPUs to 64 GPUs.

    This was available as a technology preview within PowerAI, but is now supported in PowerAI Enterprise.

    The outcome of this capability is that data science teams can run larger, more complex models while still reducing training time… allowing more iterations faster… and faster time to accurate results.
  • https://www.olcf.ornl.gov/wp-content/uploads/2018/12/summit_training_mldl.pdf
    https://vimeo.com/307071617

    Junqi Yin Advanced Data and Workflows Group
  • Watson Machine Learning Accelerator addresses memory constraints within Deep Learning

    Large Model Support
    Watson Machine Learning Accelerator (WML-A) addresses a very big deep learning scaling challenge: the size of memory available within GPUs. When data scientists develop a deep learning workload, the structure of matrices in the neural model, and the data elements which train the model (in a batch), must sit within the memory on the GPUs. As models grow in complexity and data sets increase in size, data scientists are forced to make tradeoffs to stay within the constrained 32GB (or even 16GB on older GPU cards) memory limits. Instead of training on web-scale images, WML-A users can train on high definition video. Instead of being forced in to less complex, shallower deep learning models, customers can develop more accurate models for better inference capability.

    With Large Model Support, enabled by WML-A’s unique NVLink connection between CPU (memory) and GPU, the entire model and dataset can be loaded in to system memory and cached down to the GPU for action. IBM’s capabilities, with the co-optimized WML-A software on the Power Systems servers, have enabled increased model size (more layers, larger matrices), increased data element sizes (higher definition images), and larger batch sizes (for faster time to convergence). With Large Model Support, data scientists can load models which span nearly an entire terabyte of system memory across the GPUs. The final impact? Customers can now address bigger challenges and get much more work done within a cluster of WML-A servers increasing organizational efficiency.
  • Not only do large models allow data scientists to work with more complex data, it turns out that for certain models because they rely on pulling significantly larger number of data elements to the training cycle that large models will allow training jobs to actually complete faster. By using the entire system memory resource that is available, Data scientists are able to operate much more efficiently within each single server. The outcome of being able to use larger data and train faster is a significant advantage for power AI enterprise, and is only available operate at this scale because of the architectural choices IBM and Nvidia have made in developing this accelerated architecture.
  • When you need to retrain models frequently – multiple times per day:
    Cybersecurity threats on your critical infra (e.g. energy grid), credit card fraud detection models
    Online retraining: e.g. anomaly detection on your compute or storage infrastructure, where you want to constantly learn from new events, to improve model

  • These are all Power-9 results, CPU-only.
    Datasets:
    Epsilon: 300K x 2000
    Higgs: 8M x 28
    Creditcard: 200K x 28
    Susy: 3.75M x 18
  • This is our prescriptive approach to helping clients accelerate their journey to AI which connects their data and AI capabilities within a unified data and AI lifecycle (or platform). This is also a way to help our clients identify where they are and where to focus based upon their maturity on the journey to AI. Furthermore, it is an organizing construct to the Data and AI products and services offered by IBM and our business partners, and it is the technology foundation to unify how those products and services work together. 
     
    What we have learned from AI pioneers is that every step of the ladder is critical. AI is not magic and requires a thoughtful and well-architected approach. For example, the vast majority of AI failures are due to data preparation and organization, not the AI models themselves. Success with AI models is dependent on achieving success first with how you COLLECT and ORGANIZE data.

    Therefore, we believe clients must:

    COLLECT -- Establish a strong foundation of data, making it simple and accessible, regardless where that data resides. Since data used in AI is often very dynamic and fluid with ever-expanding sources, virtualizing how data is collected is critical for clients.  
    ORGANIZE – Create a trusted, business-ready analytics foundation that ensures your data is ready for AI. Just because you can access your data doesn’t mean that it’s prepared for AI use cases. Bad data is paralyzing to AI. So clients must integrate, cleanse, catalog, and govern the full lifecycle of their AI data.
    ANALYZE – Once your data is accessible and AI-ready, then you are better prepared to apply advanced analytics and AI models. This rung provides the business and planning analytics capabilities that are key for success with AI. It also provides the capabilities needed to build, deploy, and manage AI models within an integrated portfolio of technology. 
    INFUSE – Many businesses create highly useful AI models but then encounter challenges in operationalizing them to attain broader business value. This rung of the ladder infuses AI to achieve trust and transparency in model-recommended decisions, decision explainability, bias detection, decision audits, etc. For clients with common use cases, the INFUSE rung operationalizes those AI use cases with pre-built application services, speeding time to value.
    MODERNIZE – Given the dynamic nature of AI, your data estate needs a highly elastic and extensible multi-cloud infrastructure to unify the aforementioned capabilities within a fully governed team-platform. Clients are also looking to automate their AI lifecycles across an array of contributors through collaborative workflows. Essentially, MODERNIZE means building an information architecture for AI that provides choice and flexibility across your enterprise.  As clients modernize their data estates for an AI and multicloud world, they will find that there is less "assembly required" in expanding the impact of AI across the organization. 
  • This is the IBM Cloud Architecture Center high level reference architecture. A Data centric and AI reference architecture needs to support capabilities that address the Collect, Analyze, Organize and Infuse activities. 
    This architecture diagram illustrates the need for strong data management capabilities inside a 'multi cloud data platform' (Dark blue area), on which AI capabilities are plugged in to support analyze done by data scientists ( machine learning workbench and business analytics).
    The data platform addresses the data collection and transformation to move data to local highly scalable store. Sometime, it is necessary to avoid moving data when there is no need to do transformations or there is no performance impact to the origin data sources by adding readers, so a virtualization capability is necessary to open a view on remote data sources without moving data.
    On the AI side, data scientists need to perform data analysis, which includes making sense of the data using data visualization. To build a model they need to define features, and the AI environment supports feature engineering. Then to build the model, the development environment helps to select and combine the different algorithms and to tune the hyper parameters. The execution can be done on local cluster or can be executed, at the big data scale level, to machine learning cluster.
    Once the model provides acceptable accuracy level, it can be published as a service. The model management capability supports the meta-data definition and the life cycle management of the model. When the model is deployed, monitoring capability, ensures the model is still accurate and even not biased. 
    The intelligent application, represented as a combination of capabilities at the top of the diagram: business process, core application, CRM... can run on cloud, fog, or mist. It accesses the deployed model, access Data using APIs, and even consumes pre-built models, congitive services, like a speech to text and text to speech service, an image recognition, a tone analyzer services, the Natural Language Understanding (NLU), and chatbot. 

  • ISCEP Link

×