Artificial Intelligence brings
new Cognitive Capabilities
• Computers can be trained to “See”
Example: Airport security inspecting luggage
• Computers can be trained to “Hear”
Example: Maintenance crew listening to railcars
• Computers can be trained to “do”: mimic an expert
Example: Mobile phone provider predicting customer churn
Data + Algorithms + Compute
The key triggers rapidly advancing AI
Open Source Software
ML Framework Landscape
Which ML frameworks have you used
the most over the last 5 years?
Source: Kaggle Data Science Survey 2018
scikit-learn is, by far, the most widely-used
• Wide variety of ML models
• Good documentation
• Standardized API
Some downsides of scikit-learn are:
1. Lack of support of deep learning (DL)
2. Slow performance for large datasets
Problem (1) is addressed by DL frame works in
PowerAI (TensorFlow, PyTorch) recently rebranded
as Watson Machine Learning Accelerator
Problem (2) is addressed by Snap ML
Watson Machine Learning Community Edition
IBM Enhanced Caffe
Curated, tested and pre-compiled binary
software distribution that enables enterprises
to quickly and easily deploy deep learning for
their data science and analytics development
Including all of the following frameworks:
Distributed Deep Learning
Simplifies the process of training
deep learning models across a
cluster for faster time to results.
WML CE software and the
accelerated Power servers
support a host of accelerator
libraries like SnapML, Nvidia
Large Model Support
Use system memory with GPUs
to support more complex models
and higher resolution data.
IBM adds value to curated, tested, and
pre-compiled frameworks with
Watson Machine Learning Community Edition
Evolving from compute systems to Cognitive Systems
P8 P9 P10
Not Just About Hardware Design
It’s about co-optimization and open
which just work for ML, DL, and AI
Train larger more complex models
Large Model SupportTraditional Model Support
Limited memory on GPU forces tradeoff
in model size / data resolution
Use system memory and GPU to support more
complex and higher resolution data
Large AI Models Train
~4 Times Faster
POWER9 Servers with
NVLink to GPUs
x86 Servers with PCIe to
Xeon x86 2640v4 w/
4x V100 GPUs
Power AC922 w/ 4x
Caffe with LMS (Large Model Support)
Runtime of 1000 Iterations
GoogleNet model on Enlarged
ImageNet Dataset (2240x2240)
TensorFlow Large Model Support NVLINK2 Advantage
3DUnet segmentation models with
higher resolution images allows for
learning and labeling finer details
and structures of brain tumors.
Accelerating Machine Learning
Speed is important/crucial in many cases:
• online re-training of models
• model selection and hyper-parameter tuning
• fast adaptability to changes
Large datasets arise in numerous business-critical
applications: recommendation, credit fraud, advertising,
space exploration, weather, etc.
Not everyone can afford on-prem computing.
Renting computing in the cloud is billed by usage.
Less usage means savings, higher profit margin.
Snap ML is a framework for training
Machine Learning (ML) Models
It is characterized by:
scalability to very large datasets
high resource efficiency
Which models are supported?
Snap ML (PowerAI 1.6.0) currently supports:
• Generalized Linear Models:
- Logistic Regression
- Ridge Regression
- Lasso Regression
- Support Vector Machines (SVMs)
• Tree-based models:
- Decision Trees
- Random Forest
With more to come…
Source: Kaggle Data Science Survey 2017
Which data science methods are used at work?
On average 6.5x faster than sklearn (CPU-only) On average 3.8x faster than sklearn (CPU-only)
Project www: https://www.zurich.ibm.com/snapml/
Core publication: https://arxiv.org/abs/1803.06333
RAPIDS is a set of open source libraries for GPU accelerating data preparation and machine
OSS website: rapids.ai
Nvidia RAPIDS cuDF - GPU DataFrames
is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data
provides a pandas-like API that will be familiar to data engineers & data scientists
Current version is 0.6
PowerAI 1.6.0 CuDF included tech preview version is backlevel (0.2)
WIP to get latest into Conda or build yourself (open source)
Examples of data manipulation in cuDF like object creation, viewing, selection, merge, concat, etc can be
Simple cuDF example
download a CSV, then uses the GPU to parse it into rows and columns and run calculations:
Nvidia RAPIDS cuML - GPU Machine Learning
is a suite of libraries that implement machine learning algorithms and mathematical primitives functions
enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs
Current version is 0.6
PowerAI 1.6.0 CuML included tech preview version is backlevel (0.2)
WIP to get latest into Conda or build yourself (open source)
Documentation on supported algorithms like Kmeans, tSVD, PCA, DBSCAN can be found here:
Simple cuML example
loads input and computes DBSCAN clusters, all on GPU:
COLLECT - Make data simple and accessible
ORGANIZE - Create a trusted analytics foundation
ANALYZE - Scale AI everywhere with trust & transparency
Data of every type, regardless of
where it lives
your data estate for an
AI and multicloud world
INFUSE – Operationalize AI across business processes
The AI Ladder
A prescriptive approach to accelerating the journey to AI
Introduction to Nvidia TensorRT
NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep
learning inference optimizer and runtime that delivers low latency and high-throughput for deep
learning inference applications.
Nvidia website: https://developer.nvidia.com/tensorrt
Tensorflow and TensorRT inference
with TensorRT™ (TF-TRT)
optimizes and executes
allowing TensorFlow to execute
the remaining graph. While you
can still use TensorFlow's wide
and flexible feature
set, TensorRT will parse the
model and apply optimizations
to the portions of the graph
Note: TensorRT engines are optimized for the
currently available GPUs, so conversions should
take place on the machine that will be running
Calibrating for lower precision with a minimal loss of accuracy
reduces the requirements on bandwidth and allows for faster
computation speed. It also allows for the use of Tensor Cores,
which perform matrix multiplication on 4×4 FP16 matrices and adds
a 4×4 FP16 or FP32 matrix.
Nvidia TensorRT Current Version
Version 6 Announced on September 16th (current)
Version 126.96.36.199 added as a tech preview to WML CE 1.6.1
Nvidia TensorRT: https://developer.nvidia.com/tensorrt
WML CE 1.6.1: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
TF-TRT Documentation: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/
IBM TensorRT introduction blog: https://developer.ibm.com/linuxonpower/2019/07/29/introducing-tensorflow-with-tensorrt-tf-trt/
IBM Tensorflow Serving blog (includes TensorRT example): https://developer.ibm.com/linuxonpower/2019/08/05/using-tensorrt-models-
Image classification and object detection: github.com/tensorflow/tensorrt
Mixed precision and accuracy: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9143-mixed-precision-
IBM Systems WW Client Experience Centers
IBM Internal Use Only
Search Center Offerings in ISCEP:
Contact Center via
IBM Systems Worldwide Client Experience
Centers maximize IBM Systems competitive
advantage in the Cloud and Cognitive era by
providing access to world class technical
experts and infrastructure services to assist
Clients with the transformation of their IT
implementations. Center offerings enable IBM
Sellers and Business Partners to progress and
expedite System Sales opportunities.
9 Worldwide Locations (* also Infrastructure
Austin TX , *Poughkeepsie NY, Rochester MN,
Tucson AZ, *Beijing CHINA, Boeblingen
GERMANY, Guadalajara MEXICO,*Montpellier
FRANCE, Tokyo JAPAN
(Inbound & Outbound)
Benchmarks, MVP & Proof
Certify ISV solutions
(Inbound to Centers)
Advise clients, Enable
Sellers, “Art of the
Discovery & Design
Creation of assets
(Inbound & Outbound)
NEW: Co-Creation Lab; CEC Cloud; IBM Systems Center of Competency for Red
IBM’s statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise,
or legal obligation to deliver any material, code or functionality. Information about potential
future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in
a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the
amount of multiprogramming in the user’s job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve results similar to those stated here.
Notices and disclaimers
46Replace the footer with text from the PPT-Updater. Instructions are included in that file.
Information concerning non-IBM products was obtained from the
suppliers of those products, their published announcements or other
publicly available sources. IBM has not tested those products about this
publication and cannot confirm the accuracy of performance, compatibility
or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of
those products. IBM does not warrant the quality of any third-party
products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM expressly disclaims all
warranties, expressed or implied, including but not limited to, the
implied warranties of merchantability and fitness for a purpose.
The provision of the information contained herein is not intended to, and
does not, grant any right or license under any IBM patents, copyrights,
trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM
products and services used in the presentation] are trademarks of
International Business Machines Corporation, registered in many
jurisdictions worldwide. Other product and service names might
be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark
information" at: www.ibm.com/legal/copytrade.shtml.
Notas del editor
So what is triggering the rapid advancements in AI? It comes from major innovation in three critical categories: 1) Digitization of society is creating an abundance of interesting datasets. Inside and outside the enterprise. And that continues to grow about 40% per year. 2) Algorithm innovation in supervised & unsupervised learning techniques. Especially Deep Learning. Most of which is advancing in open source. 3) Ability to run those algorithms on distributed compute and especially on GPUs.
So together, the developments here have allowed us to employ AI on any problem where a human can get a task done in less than a 1 second of thought . It’s in this scope of problems where AI is being applied and it’s being wielded to create an flywheel: Data -> Products -> Users. Which is why competing on algorithms alone is not a defensible model.
REFERENCE NOTES: Top trends: 99% of commercial value associated with A->B: 0s or 1s. This is called supervised learning. Speech Recognition: Audio -> Text Image Recognition
Types of Deep Learning: Supervised Learning: Learn from labeled datasets. Most economic value is here and drops off quickly through below. Transfer Learning: Learn about one topic. Apply to another domain. Unsupervised Learning. Learning without labeled data Reinforcement Learning.
The rise of the internet via analogy: Shopping mall + internet doesn’t make an internet/ecommerce company What defines whether you are truly an internet company? A) architect the organizational design to take advantage of the internet. For instance, A/B tests, short cycle times, push decision making down to PM/dev,
The rise of the AI era: Traditional tech company + deep learning doesn’t make it an AI company. Although only some patterns exist, Google & Baidu are good examples. Other patterns: a) strategic data acquisition, b) unified data ‘warehouse’, c) persuasive automation, d) new job descriptions.
Building an AI company, centrally build an AI group and matrix them into your AI.
When working with clients, these are the top AI scenarios to look for as you explore their potential AI use cases.
The genesis of IBM PowerAI (now known as Watson Machine Learning Community Edition - WML CE) was to make it simple for data scientists to be more productive, more quickly, by greatly simplifying the tasks necessary to get up and running. WML CE is an enterprise software distribution that combines popular open source deep learning frameworks, efficient AI development tools, and accelerated IBM Power Systems servers to take your deep learning projects to the next level.
For a fee, IBM offers formal support for WML CE components as long as their versions are consistent with the release configuration (NOTE that WML CE is a no charge offering but we do offer support for a fee). If you choose to use a different version of any of the components, no formal support will be available. However, in keeping with industry norms, specific questions can be posted on the WML CE space on DeveloperWorks Answers: https://developer.ibm.com/answers/topics/powerai/. This forum is monitored by the IBM technical team and technical support is provided on a best effort basis.
There a several ways for you to get WML CE. Order it. WML CE is available as a no charge orderable part number from IBM (called PowerAI until 2H2019). Download it from here: http://ibm.biz/download-powerai Get the Docker container from here: https://hub.docker.com/r/ibmcom/powerai/
As of WML CE (PowerAI) 1.5.4, the following frameworks are included in WML CE: (Make sure to check the Knowledge Center for the latest versions as they change rapidly https://www.ibm.com/support/knowledgecenter/SS5SF7_1.5.4/navigation/pai_software_pkgs.html): DDL 1.2.0 - Distributed Deep Learning (with support for up to 4 nodes in WML CE) TensorFlow 1.12.0 Tensorflow Probability 0.5.0 - TensorFlow Probability is a library for probabilistic reasoning and statistical analysis. TensorBoard 1.12.0 - a suite of visualization tools for TensorFlow TensorFlow Keras – NOTE that Keras is supported as part of the TensorFlow core library and as such we can support Keras through TensorFlow IBM enhanced Caffe 1.0.0 BVLC Caffe 1.0.0 - The Berkeley Vision and Learning Center (BVLC) Caffe2 1.0rc1 – in technology preview PyTorch 1.0rc1 Snap ML 1.0.0 Spectrum MPI 10.2 Bazel 0.15.0 OpenBLAS 0.3.3 HDF5 1.10.1 Protobuf 3.6.1 ONNX 1.3.0 – in technology preview
There are three additional capabilities on top of the open source frameworks (and in addition to the performance advantage that Power brings to the table); Large Model Support (LMS), Distributed Deep Learning (DDL), and support by IBM.
Large Model Support WML CE addresses a fundamental limitation for deep learning; the size of memory available within GPUs. When training complex models or training with high definition images, the memory available on a GPU can be prohibitively restrictive. Instead of being forced into less complex, shallower deep learning models, customers can develop more accurate models with Large Model Support.
With Large Model Support, enabled by IBM’s unique NVLink connection between CPU (memory) and GPU, the entire model and dataset can be loaded in to system memory and cached down to the GPU for action. Customers can now address bigger challenges and get much more work done within a cluster of WML CE servers increasing organizational efficiency. We will cover more details on LMS later in this deck.
Distributed Deep Learning To accelerate the time dedicated to training a model, the WML CE stack includes function for distributing a single training job across a cluster of servers. IBM’s Distributed Deep Learning brings intelligence about the structure and layout of the underlying hardware cluster (topology). The impact of this is significant! WML CE and WML-A with Distributed Deep Learning can scale jobs across large numbers of cluster resources with very little loss due to communications overhead. There will be more details later in the presentation. WML CE allows for the use of DDL with up to a 4 node cluster. If a client wants to scale beyond 4 nodes, they must purchase WML-A.
Supported by IBM Although WML CE is available free to download and use, IBM also provides a “for fee” support offering for those clients that want enterprise level support for the features and capabilities within the base offering.
We normally would focus on the HW optimization starting with the processor, the IO interfaces enabled by this processor and then what accelerators we would align to those interfaces for the optimal performance. And we are doing that today, however, it is not just about the HW. As I mentioned on the previous slide, we co-optimized the SW. We took the opensource deep learning frameworks and optimized them around this advanced design, added enhancements such as spark conductor for DDL and large model support while supporting everything from the HW to the SW in the solution. Not only do we have differentiated HW in AC922 with many industry only innovation, but we have a full SW offering on top of it that is equally rich of differentiated innovation and innovations only found with Power Systems.
It’s estimated that 1.2 trillion photos will be taken in 2017. Even if each photo only took someone 1 second to organize, tag and annotate, it would still take over 38,000 years to classify them all!
There is a competition every year, known as ImageNet. Roughly 500,000 images (low resolution) and 200 categories for which to classify them.
We talked about this earlier – it’s all about maximizing accuracy (or minimizing error/loss) One way to get more accurate models is to simply add more layers The more layers the more complex, and the more difficult (computationaly) it becomes to train
Distributed deep learning (DDL) is IBM’s high performance approach to training single models across an entire cluster of compute nodes. Unlike native model parallelism (such as Google’s gRPC method for tensorflow), or Spark based approaches, the DDL library distributes model, training data set, and parameter serving across the defined cluster and it uses a novel algorithm to improve communication over very low latency fabric.
The result is extremely efficient performance scaling, losing less than 5% of ideal efficiency when moving from 4 GPUs to 64 GPUs.
This was available as a technology preview within PowerAI, but is now supported in PowerAI Enterprise.
The outcome of this capability is that data science teams can run larger, more complex models while still reducing training time… allowing more iterations faster… and faster time to accurate results.
Watson Machine Learning Accelerator addresses memory constraints within Deep Learning
Large Model Support Watson Machine Learning Accelerator (WML-A) addresses a very big deep learning scaling challenge: the size of memory available within GPUs. When data scientists develop a deep learning workload, the structure of matrices in the neural model, and the data elements which train the model (in a batch), must sit within the memory on the GPUs. As models grow in complexity and data sets increase in size, data scientists are forced to make tradeoffs to stay within the constrained 32GB (or even 16GB on older GPU cards) memory limits. Instead of training on web-scale images, WML-A users can train on high definition video. Instead of being forced in to less complex, shallower deep learning models, customers can develop more accurate models for better inference capability.
With Large Model Support, enabled by WML-A’s unique NVLink connection between CPU (memory) and GPU, the entire model and dataset can be loaded in to system memory and cached down to the GPU for action. IBM’s capabilities, with the co-optimized WML-A software on the Power Systems servers, have enabled increased model size (more layers, larger matrices), increased data element sizes (higher definition images), and larger batch sizes (for faster time to convergence). With Large Model Support, data scientists can load models which span nearly an entire terabyte of system memory across the GPUs. The final impact? Customers can now address bigger challenges and get much more work done within a cluster of WML-A servers increasing organizational efficiency.
Not only do large models allow data scientists to work with more complex data, it turns out that for certain models because they rely on pulling significantly larger number of data elements to the training cycle that large models will allow training jobs to actually complete faster. By using the entire system memory resource that is available, Data scientists are able to operate much more efficiently within each single server. The outcome of being able to use larger data and train faster is a significant advantage for power AI enterprise, and is only available operate at this scale because of the architectural choices IBM and Nvidia have made in developing this accelerated architecture.
When you need to retrain models frequently – multiple times per day: Cybersecurity threats on your critical infra (e.g. energy grid), credit card fraud detection models Online retraining: e.g. anomaly detection on your compute or storage infrastructure, where you want to constantly learn from new events, to improve model
These are all Power-9 results, CPU-only. Datasets: Epsilon: 300K x 2000 Higgs: 8M x 28 Creditcard: 200K x 28 Susy: 3.75M x 18
This is our prescriptive approach to helping clients accelerate their journey to AI which connects their data and AI capabilities within a unified data and AI lifecycle (or platform). This is also a way to help our clients identify where they are and where to focus based upon their maturity on the journey to AI. Furthermore, it is an organizing construct to the Data and AI products and services offered by IBM and our business partners, and it is the technology foundation to unify how those products and services work together.
What we have learned from AI pioneers is that every step of the ladder is critical. AI is not magic and requires a thoughtful and well-architected approach. For example, the vast majority of AI failures are due to data preparation and organization, not the AI models themselves. Success with AI models is dependent on achieving success first with how you COLLECT and ORGANIZE data.
Therefore, we believe clients must:
COLLECT -- Establish a strong foundation of data, making it simple and accessible, regardless where that data resides. Since data used in AI is often very dynamic and fluid with ever-expanding sources, virtualizing how data is collected is critical for clients. ORGANIZE – Create a trusted, business-ready analytics foundation that ensures your data is ready for AI. Just because you can access your data doesn’t mean that it’s prepared for AI use cases. Bad data is paralyzing to AI. So clients must integrate, cleanse, catalog, and govern the full lifecycle of their AI data. ANALYZE – Once your data is accessible and AI-ready, then you are better prepared to apply advanced analytics and AI models. This rung provides the business and planning analytics capabilities that are key for success with AI. It also provides the capabilities needed to build, deploy, and manage AI models within an integrated portfolio of technology. INFUSE – Many businesses create highly useful AI models but then encounter challenges in operationalizing them to attain broader business value. This rung of the ladder infuses AI to achieve trust and transparency in model-recommended decisions, decision explainability, bias detection, decision audits, etc. For clients with common use cases, the INFUSE rung operationalizes those AI use cases with pre-built application services, speeding time to value. MODERNIZE – Given the dynamic nature of AI, your data estate needs a highly elastic and extensible multi-cloud infrastructure to unify the aforementioned capabilities within a fully governed team-platform. Clients are also looking to automate their AI lifecycles across an array of contributors through collaborative workflows. Essentially, MODERNIZE means building an information architecture for AI that provides choice and flexibility across your enterprise. As clients modernize their data estates for an AI and multicloud world, they will find that there is less "assembly required" in expanding the impact of AI across the organization.
This is the IBM Cloud Architecture Center high level reference architecture. A Data centric and AI reference architecture needs to support capabilities that address the Collect, Analyze, Organize and Infuse activities. This architecture diagram illustrates the need for strong data management capabilities inside a 'multi cloud data platform' (Dark blue area), on which AI capabilities are plugged in to support analyze done by data scientists ( machine learning workbench and business analytics). The data platform addresses the data collection and transformation to move data to local highly scalable store. Sometime, it is necessary to avoid moving data when there is no need to do transformations or there is no performance impact to the origin data sources by adding readers, so a virtualization capability is necessary to open a view on remote data sources without moving data. On the AI side, data scientists need to perform data analysis, which includes making sense of the data using data visualization. To build a model they need to define features, and the AI environment supports feature engineering. Then to build the model, the development environment helps to select and combine the different algorithms and to tune the hyper parameters. The execution can be done on local cluster or can be executed, at the big data scale level, to machine learning cluster. Once the model provides acceptable accuracy level, it can be published as a service. The model management capability supports the meta-data definition and the life cycle management of the model. When the model is deployed, monitoring capability, ensures the model is still accurate and even not biased. The intelligent application, represented as a combination of capabilities at the top of the diagram: business process, core application, CRM... can run on cloud, fog, or mist. It accesses the deployed model, access Data using APIs, and even consumes pre-built models, congitive services, like a speech to text and text to speech service, an image recognition, a tone analyzer services, the Natural Language Understanding (NLU), and chatbot.
Parece que tiene un bloqueador de anuncios ejecutándose. Poniendo SlideShare en la lista blanca de su bloqueador de anuncios, está apoyando a nuestra comunidad de creadores de contenidos.
¿Odia los anuncios?
Hemos actualizado nuestra política de privacidad.
Hemos actualizado su política de privacidad para cumplir con las cambiantes normativas de privacidad internacionales y para ofrecerle información sobre las limitadas formas en las que utilizamos sus datos.
Puede leer los detalles a continuación. Al aceptar, usted acepta la política de privacidad actualizada.