SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
OVERVIEW DEEP DIVE STEPS TO
Strategy
Portfolio
compute
hardware
software
performance
usecases
develop
partner
© 2019 Intel Corporation
OVERVIEW DEEP DIVE STEPS TO
Strategy
Portfolio
compute
hardware
software
performance
usecases
develop
partner
© 2019 Intel Corporation
#datacentric
Multi-cloudINTELLIGENTEdgeDevice
© 2019 Intel Corporation
© 2019 Intel Corporation
Deliverthebest
AIPlatforms
Shape&winindustry
OpenSoftwareStacks
Seed&drive
TheEcosystem
ExtendCPU
Mostcompleteportfolio
Bestintegratedplatforms
Optimizecustomersoftware
Build“OneAPI”unifiedplatform
Evangelizetodevelopers
Seedemergingusecases
Attract&developtoptalent
PioneerleadingedgeAI
© 2019 Intel Corporation
Faster More Everything
SILICONPHOTONICS
OMNI-PATHFABRIC
ETHERNET
INTELATOM®CPROCESSORS
INTEL®NERVANA™NNP
INTEL®MOVIDIUS™TECHNOLOGY
INTEL®XEON®DPROCESSORS
INTEL®XEON®SCALABLEPROCESSORS
INTEL®FPGASDCSERIESSSD
QLC3DNANDDRIVE
OPTANEDC<>>>
SOLIDSTATEDRIVE
OPTANEDC<>>>
PERSISTENTMEMORY
© 2019 Intel Corporation
50+ YEARS OF INTEL INNOVATION – DEVICE TO INTELLIGENT EDGE & MULTI-CLOUD
MULTI PURPOSE, FORMS
FOUNDATION FOR AI
ACCELERATION OF GRAPHICS,
MEDIA, ANALYTICS AND AI+
REAL TIME AI+ FOR OPTIMIZED
SYSTEM PERFORMANCE
HIGHLY EFFICIENT MULTI-MODAL
INFERENCE & TRAINING
PerformanceefficiencyWorkloadbreadth
Multi models – high performance across broad set of existing and emerging DL models for applications like Natural Language Processing, Speech & Text, Recommendation Engines, and Video & Image search & filtering
A+ - Combined workflows where use cases span AI plus additional workloads such as- Immersive Media - Media + Graphics
© 2019 Intel Corporation
Intelligentedge
Multi-cloud
NNP-I
NNP-L+ACCELERATION
DEPLOY AI ANYWHERE FROM DEVICE TO INTELLIGENT EDGE & MULTI-CLOUD
device
© 2019 Intel Corporation
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Network
Database
Analytics
Multi-cloud&Orchestration
AI
Security
Virtualization
HPC
COMPUTE DEMAND
© 2019 Intel Corporation
PERFORMANCETOPROPELINSIGHTS
BUSINESSRESILIENCETHROUGHHardware-enhancedsecurity
AGILESERVICEDELIVERY
FASTER TIME
TO VALUE
GLOBAL
ECOSYSTEM
WORKLOAD
OPTIMIZED
CUSTOMER
OBSESSED
.
BUILT-IN
VALUE
© 2019 Intel Corporation
OVERVIEW DEEP DIVE STEPS TO
Strategy
Portfolio
compute
hardware
software
performance
usecases
develop
partner
© 2019 Intel Corporation
JUL'17 DEC'18 APR'19 COLUMN1
5.7
1.0
INTEL OPTIMIZATION FOR CAFFE RESNET-50
INFERENCETHROUGHPUT(IMAGES/SEC)
intel®xeon®platinum
intel®xeon®platinum
INTEL AVX-512
INTEL DLBOOST
OPTIMIZING AI INFERENCE
intel®xeon®platinum
8100processor
BASE VS BASE VS BASE
© 2019 Intel Corporation
vpmadd231ps
OUTPUT
FP32
INPUT
FP32
INPUT
FP32 Port0 Port5
…. ….
FMA FMA
Microarchitecture view
In a given clock cycle
vpmaddubsw
OUTPUT
INT16
CONSTANT
INT16
vpmaddwd
OUTPUT
INT32
Accumulate
INT32
vpaddd OUTPUT
INT32
INPUT
INT8
INPUT
INT8
1st gen Intel® Xeon®
Scalable processor
without Intel® DL Boost
1st gen Intel® Xeon®
Scalable processor
without Intel® DL Boost
2nd gen Intel® Xeon®
Scalable processor
with Intel® DL Boost
VNNI VNNI
FP32
INT8
INT8
VNNI
NEW
© 2019 Intel Corporation
vpdpbusd
OUTPUT
INT32
INPUT
INT8
INPUT
INT8
Accumulate
INT32
OPTIMIZING AI INFERENCE
Vector Neural Network Instruction
https://software.intel.com/sites/default/files/managed/c5/15/architecture-
instruction-set-extensions-programming-reference.pdf
2nd gen Intel® Xeon®
Scalable processor
with Intel® DL Boost
INT8
VNNI
NEW
vpdpbusd
Accumulator
INT32
INPUT
INT8
INPUT
INT8
Accumulator
INT32
Precisionisameasureofthedetailusedfornumericalvaluesprimarilymeasuredinbits Signed Range
±1.18×10−38 to ±3.4×1038
±1.18×10−38 to ±3.4×1038
±6.10 × 10−5 To ± 65504
-32,768 to +32,767
-128 to 127
DlComputeinHighprecision-Highprocessingtime&highaccuracyinresults
DlComputeinLowerprecision-Fasterresultsbutpotentiallylessaccurate
Speedaccuracytradeoff
© 2019 Intel Corporation
BF16bit
training
8bit
inference
Better cache usage
Avoids bandwidth bottlenecks
Maximizes compute resources
Lesser silicon area & power
FP32
TRAINING
FP32
inference
Need-FP32Weights INT8WeightsFORINFERENCE
Benefit-Reducemodelsizeandenergyconsumption
Challenge-Possibledegradationinpredictiveperformance
Solution-Quantizationwithminimallossofinformation
Deployment-OpenVinotoolkitordirectframework(tf)
© 2019 Intel Corporation
8bit
inference
FP32weights
training
Reduceprecision& keepmodelaccuracy
quantizationawaretraining
Simulate the effect of quantization in the
forward and backward passes using FAKE
quantization
Posttrainingquantization
Train normally, capture FP32 weights; convert
to low precision before running inference and
calibrate to improve accuracy
FP32weights
training
8bit
inference
Collect statistics for the activation to find
quantization factor
Calibrate using subset data to improve
accuracy, Convert FP32 weights to INT8
8bit
inference
FP32weights
training
captured FP32 weights are
quantized to int8 at each
iteration after the weight
updates
Convert
FP32
weights
to INT8
© 2019 Intel Corporation
Directframework[tf] Inference
with
standard TF
https://github.com/IntelAI/models/tree/ma
ster/benchmarks
Freeze
graph
Calibrate
Convert FP32
weights to INT8
Training
with
standard TF
Quantized
graph
OpenVinotool
https://docs.openvinotoolkit.org/
Trained
model - DL
Framework
© 2019 Intel Corporation
Posttrainingquantization
Train normally, capture FP32 weights; convert
to low precision before running inference and
calibrate to improve accuracy
FP32weights
training
8bit
inference
Collect statistics for the activation to find
quantization factor
Calibrate using subset data to improve
accuracy, Convert FP32 weights to INT8
Image
Recognition
Object
Detection
Rec.
Systems
Relative Inference Throughput Performance [Higher Is Better]
See Config #46 & #47
Performance results are based as of date shown in configurations and may not reflect all publically available security updates. No product can be absolutely secure. See configuration disclosure for details. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel
microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not
manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for
more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when
combined with other products. For more complete information visit: http://www.intel.com/performance Other names and brands may be claimed as the property of others.
2.1x @ 0.239 % accuracy loss
2.1x @ 0.007 % accuracy loss
2.2x @ 0.009 mAP accuracy loss
2.5x @ 0.00019 mAP accuracy loss
2.6x @ 0.003 mAP accuracy loss
3.0x @ 0.120 % accuracy loss
3.7x @ 0.609 % accuracy loss
3.8x @ 0.560 % accuracy loss
3.9x @ 0.209 % accuracy loss
3.9x @ 0.450 % accuracy loss
4.0x @ 0.580 % accuracy loss
0x 1x 2x 3x 4x
Wide and Deep
Wide and Deep
SSD-MobileNet
SSD-VGG16
RetinaNet
ResNet-50
ResNet-50
ResNet-101
ResNet-50
ResNet-50
ResNet-101
DL BOOST (VNNI) ON 2ND GEN XEON VS FP32 ON 1ST GEN XEON
Baseline :
FP32 performance on Intel®
Platinum 8180 Processor
© 2019 Intel Corporation
#datacentric
© 2019 Intel Corporation
white paper:
https://www.intel.ai/papers/siemens
-healthineers-ai-cardiac-imaging/
#datacentric
Client: Siemens Healthineers is the
global market leader in diagnostic
imaging. Artificial intelligence (AI) is
transforming care delivery and
expanding precision medicine.
Siemens Healthineers utilizes AI to
help automate and standardize
complex diagnostics to improve
patient outcomes.
Challenge: Segmentation of anatomical regions for medical
imaging requires hardware that delivers increased throughput
for Deep Learning inferencing. This is used to automate and
standardize clinical measurements, such as measurement of
heart chambers, ejection fraction, and strain. Due to increased
data generation and workflow demands, high performance
Deep Learning inference is required to help reduce the burden
on radiologists and cardiologists, enabling improved clinical
decisions in near real-time.
Solution: The next generation Intel® Xeon® Scalable processors
with Intel® Deep Learning Boost, in combination with the Intel®
Distribution of OpenVINO™ toolkit, automate anatomical
measurements in near real-time and improve workflow
efficiency, while maintaining accuracy of the model without the
need for discrete GPU investment.
*Other names and brands may be claimed as the property of others.
Configuration: Based on Siemens Healthineers and Intel Analysis on 2nd Gen Intel® Xeon® Platinum 8280 Processor (28 Cores) with 192GB, DDR4-2933, using Intel® OpenVino™ 2019 R1. HT ON, Turbo ON. CentOS Linux release 7.6.1810, kernel 4.19.5-
1.el7.elrepo.x86_64.Custom topology and dataset (image resolution 288x288). Comparing FP32 vs Int8 w/ Intel® DL Boost performance on the system.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit http://www.intel.com/performance.
Result
5.5xfaster(now201fPS)
This Siemens Healthineers’ feature is currently under development and is not currently for sale.
Using 2nd Generation Intel® Xeon
Scalable with Intel® DL Boost (INT8
precision) compared to FP32
precisionLeft Ventricle, Right Ventricle (short-axis), Left Ventricle, Right Atrium and Left Atrium (long-axis)
OVERVIEW DEEP DIVE STEPS TO
Strategy
Portfolio
compute
hardware
software
performance
usecases
develop
partner
© 2019 Intel Corporation
Image
Resnet-50, Inception V3,
MobileNet, SqueezeNet
R-FCN, Faster-RCNN ,
Yolo V2, SSD-VGG16,
SSD-MobileNet
Mask R-CNN
Imagerecognition
Objectdetection
Imagesegmentation
GNMT
Wide & Deep, NCF
Languagetranslation
Texttospeech
WaveNet
MiniGo
© 2019 Intel CorporationOther names and brands may be claimed as the property of others.
Image
Config 1: Nvidia data source: https://developer.nvidia.com/deep-learning-performance-training-inference (Updated March 5th 2019). See Config 24 for detailed Xeon Configuration
Config 2: See Config.slide 22 for Xeon and Nvidia configuration
Config 3: See Config slide 23 for Xeon and Nvidia configuration
Config 4: See Config slide 12 for Xeon and MLPerf configuration ; MLPerf v0.5 Training Closed - Reinforcement benchmark (MiniGo); system employed TensorFlow 1.10.1 with MKLDNN v0.14. Retrieved from www.mlperf.org. MLPerf v0.5 Results December 12th,
2018. MLPerf name and logo are trademarks. See www.mlperf.org for more information. Estimated score of 11.81 on the MLPerf v0.5 Training Closed Division – Reinforcement benchmark (MiniGo). Result not verified by MLPerf. MLPerf name and logo are trademarks. See www.mlperf.org for more
information. See Config 12 for detailed Xeon configuration.; 1.87x performance improvement in MLPerf MiniGo performance on 2S Intel® Xeon® Platinum 9282 processor compared to published MLPerf v0.5 MiniGo results on 2S Intel® Xeon® Platinum 8180 processor
Performance results are based on testing as of 1/31/2019 and may not reflect all publically available security updates. No product can be absolutely secure. See configuration disclosure for details. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel
microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured
by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
For more complete information visit: http://www.intel.com/performance Other names and brands may be claimed as the property of others.
Config 1 Config 2 Config 3 Config 4
© 2019 Intel Corporation
Image
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance
tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Other names and brands may be claimed as the property of others.
https://www.intel.com/content/www/us/en/process
ors/xeon/scalable/software-solutions/taboola-
cloud.html
https://www.intel.ai/papers/iflytek-xeonscalable/ https://www.intel.ai/rl-coach-aws-sagemaker-
accelerated-deep-rl/
Billions of real time
content
recommendations
Advanced RL
models on C5
instances
Leading
advancements in
speech recognition
https://newsroom.intel.com/news/new-intel-based-
artificial-intelligence-imaging-solution-accelerate-
critical-patient-diagnoses/
CT scan radiology
medical image
classification
GE Healthcare
© 2019 Intel Corporation
OPTIMIZED SOFTWARE FOR THE BEST AI PERFORMANCE
github.com/IntelAI/models/tree/m
aster/benchmarks
nGraph
software.intel.com/OpenVINO/toolkit
github.com/NervanaSystems/ngraph
MKL® DNN
DAAL
PYTHON
software.intel.com/en-
us/distribution-for-python
software.intel.com/en-us/intel-daal
github.com/intel/mkl-dnn
© 2019 Intel CorporationOther names and brands may be claimed as the property of others.
data types
INTEL® DL BOOST ECOSYSTEM SUPPORT
OPTIMIZED SW &
FRAMEWORKS
SOFTWARE
VENDORS
CLOUD SERVICE
PROVIDERS
ENTERPRISES
VIDEO ANALYSIS
TEXT DETECTION
MEDICAL IMAGE
CLASSIFICATION
FACIAL RECOGNITION
ML INFERENCING
PARTNERS
DELL* EMC* READY SOLUTIONS FOR AI
INSPUR FPGA SOLUTION FOR AI
LENOVO AI OFFERINGS
AND MORE
DEPLOY AI SOLUTIONS FASTER THROUGH INTEL’S STELLAR ECOSYSTEM
LEADING AI SYSTEMS AND SOFTWARE
SOLUTIONS >80 AVAILABLE FROM
200+ PARTNERS
builders.intel.com/ai
ACCELERATING AI INNOVATION
ACROSS CRITICAL
BUSINESS WORKLOADS
software.intel.com/ai
© 2019 Intel Corporation
Poweringtheaitransformation
aiinfusedintohardware,software&ecosystem
SpanningdevicetointelligentEdge&multi-Cloud
Unmatchedportfoliotomove,Store,Processdata
CPU,GPU,FPGA&AcceleratorOPTIMIZEDforprocessingyourai
Aicomputeportfolioforthedata-centricera
2nd GenerationIntel®Xeon®ScalableProcessorswithIntel®DeepLearningBoost
TheonlyCPUwithintegratedAIacceleration
© 2019 Intel Corporation
© 2019 Intel Corporation
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system
configuration.
No product or component can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete
information about performance and benchmark results, visit http://www.intel.com/benchmarks .
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured
using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and
performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information
visit http://www.intel.com/benchmarks .
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2,
SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by
Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost
savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost
savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are
accurate.
Intel, the Intel logo, Intel Optane, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as property of others.
Intel and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
© 2019 Intel Corporation.

Más contenido relacionado

Más de inside-BigData.com

Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architecturesinside-BigData.com
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computinginside-BigData.com
 
Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)inside-BigData.com
 

Más de inside-BigData.com (20)

Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computing
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 
Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)Deep Learning State of the Art (2020)
Deep Learning State of the Art (2020)
 

Último

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Understanding and Integrating Intel Deep Learning Boost

  • 1. OVERVIEW DEEP DIVE STEPS TO Strategy Portfolio compute hardware software performance usecases develop partner © 2019 Intel Corporation
  • 2. OVERVIEW DEEP DIVE STEPS TO Strategy Portfolio compute hardware software performance usecases develop partner © 2019 Intel Corporation
  • 4. © 2019 Intel Corporation
  • 7. 50+ YEARS OF INTEL INNOVATION – DEVICE TO INTELLIGENT EDGE & MULTI-CLOUD MULTI PURPOSE, FORMS FOUNDATION FOR AI ACCELERATION OF GRAPHICS, MEDIA, ANALYTICS AND AI+ REAL TIME AI+ FOR OPTIMIZED SYSTEM PERFORMANCE HIGHLY EFFICIENT MULTI-MODAL INFERENCE & TRAINING PerformanceefficiencyWorkloadbreadth Multi models – high performance across broad set of existing and emerging DL models for applications like Natural Language Processing, Speech & Text, Recommendation Engines, and Video & Image search & filtering A+ - Combined workflows where use cases span AI plus additional workloads such as- Immersive Media - Media + Graphics © 2019 Intel Corporation
  • 8. Intelligentedge Multi-cloud NNP-I NNP-L+ACCELERATION DEPLOY AI ANYWHERE FROM DEVICE TO INTELLIGENT EDGE & MULTI-CLOUD device © 2019 Intel Corporation
  • 9. 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Network Database Analytics Multi-cloud&Orchestration AI Security Virtualization HPC COMPUTE DEMAND © 2019 Intel Corporation
  • 11. OVERVIEW DEEP DIVE STEPS TO Strategy Portfolio compute hardware software performance usecases develop partner © 2019 Intel Corporation
  • 12. JUL'17 DEC'18 APR'19 COLUMN1 5.7 1.0 INTEL OPTIMIZATION FOR CAFFE RESNET-50 INFERENCETHROUGHPUT(IMAGES/SEC) intel®xeon®platinum intel®xeon®platinum INTEL AVX-512 INTEL DLBOOST OPTIMIZING AI INFERENCE intel®xeon®platinum 8100processor BASE VS BASE VS BASE © 2019 Intel Corporation
  • 13. vpmadd231ps OUTPUT FP32 INPUT FP32 INPUT FP32 Port0 Port5 …. …. FMA FMA Microarchitecture view In a given clock cycle vpmaddubsw OUTPUT INT16 CONSTANT INT16 vpmaddwd OUTPUT INT32 Accumulate INT32 vpaddd OUTPUT INT32 INPUT INT8 INPUT INT8 1st gen Intel® Xeon® Scalable processor without Intel® DL Boost 1st gen Intel® Xeon® Scalable processor without Intel® DL Boost 2nd gen Intel® Xeon® Scalable processor with Intel® DL Boost VNNI VNNI FP32 INT8 INT8 VNNI NEW © 2019 Intel Corporation vpdpbusd OUTPUT INT32 INPUT INT8 INPUT INT8 Accumulate INT32 OPTIMIZING AI INFERENCE
  • 14. Vector Neural Network Instruction https://software.intel.com/sites/default/files/managed/c5/15/architecture- instruction-set-extensions-programming-reference.pdf 2nd gen Intel® Xeon® Scalable processor with Intel® DL Boost INT8 VNNI NEW vpdpbusd Accumulator INT32 INPUT INT8 INPUT INT8 Accumulator INT32
  • 15. Precisionisameasureofthedetailusedfornumericalvaluesprimarilymeasuredinbits Signed Range ±1.18×10−38 to ±3.4×1038 ±1.18×10−38 to ±3.4×1038 ±6.10 × 10−5 To ± 65504 -32,768 to +32,767 -128 to 127 DlComputeinHighprecision-Highprocessingtime&highaccuracyinresults DlComputeinLowerprecision-Fasterresultsbutpotentiallylessaccurate Speedaccuracytradeoff © 2019 Intel Corporation
  • 16. BF16bit training 8bit inference Better cache usage Avoids bandwidth bottlenecks Maximizes compute resources Lesser silicon area & power FP32 TRAINING FP32 inference Need-FP32Weights INT8WeightsFORINFERENCE Benefit-Reducemodelsizeandenergyconsumption Challenge-Possibledegradationinpredictiveperformance Solution-Quantizationwithminimallossofinformation Deployment-OpenVinotoolkitordirectframework(tf) © 2019 Intel Corporation
  • 17. 8bit inference FP32weights training Reduceprecision& keepmodelaccuracy quantizationawaretraining Simulate the effect of quantization in the forward and backward passes using FAKE quantization Posttrainingquantization Train normally, capture FP32 weights; convert to low precision before running inference and calibrate to improve accuracy FP32weights training 8bit inference Collect statistics for the activation to find quantization factor Calibrate using subset data to improve accuracy, Convert FP32 weights to INT8 8bit inference FP32weights training captured FP32 weights are quantized to int8 at each iteration after the weight updates Convert FP32 weights to INT8 © 2019 Intel Corporation
  • 18. Directframework[tf] Inference with standard TF https://github.com/IntelAI/models/tree/ma ster/benchmarks Freeze graph Calibrate Convert FP32 weights to INT8 Training with standard TF Quantized graph OpenVinotool https://docs.openvinotoolkit.org/ Trained model - DL Framework © 2019 Intel Corporation Posttrainingquantization Train normally, capture FP32 weights; convert to low precision before running inference and calibrate to improve accuracy FP32weights training 8bit inference Collect statistics for the activation to find quantization factor Calibrate using subset data to improve accuracy, Convert FP32 weights to INT8
  • 19. Image Recognition Object Detection Rec. Systems Relative Inference Throughput Performance [Higher Is Better] See Config #46 & #47 Performance results are based as of date shown in configurations and may not reflect all publically available security updates. No product can be absolutely secure. See configuration disclosure for details. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Other names and brands may be claimed as the property of others. 2.1x @ 0.239 % accuracy loss 2.1x @ 0.007 % accuracy loss 2.2x @ 0.009 mAP accuracy loss 2.5x @ 0.00019 mAP accuracy loss 2.6x @ 0.003 mAP accuracy loss 3.0x @ 0.120 % accuracy loss 3.7x @ 0.609 % accuracy loss 3.8x @ 0.560 % accuracy loss 3.9x @ 0.209 % accuracy loss 3.9x @ 0.450 % accuracy loss 4.0x @ 0.580 % accuracy loss 0x 1x 2x 3x 4x Wide and Deep Wide and Deep SSD-MobileNet SSD-VGG16 RetinaNet ResNet-50 ResNet-50 ResNet-101 ResNet-50 ResNet-50 ResNet-101 DL BOOST (VNNI) ON 2ND GEN XEON VS FP32 ON 1ST GEN XEON Baseline : FP32 performance on Intel® Platinum 8180 Processor © 2019 Intel Corporation
  • 20. #datacentric © 2019 Intel Corporation white paper: https://www.intel.ai/papers/siemens -healthineers-ai-cardiac-imaging/
  • 21. #datacentric Client: Siemens Healthineers is the global market leader in diagnostic imaging. Artificial intelligence (AI) is transforming care delivery and expanding precision medicine. Siemens Healthineers utilizes AI to help automate and standardize complex diagnostics to improve patient outcomes. Challenge: Segmentation of anatomical regions for medical imaging requires hardware that delivers increased throughput for Deep Learning inferencing. This is used to automate and standardize clinical measurements, such as measurement of heart chambers, ejection fraction, and strain. Due to increased data generation and workflow demands, high performance Deep Learning inference is required to help reduce the burden on radiologists and cardiologists, enabling improved clinical decisions in near real-time. Solution: The next generation Intel® Xeon® Scalable processors with Intel® Deep Learning Boost, in combination with the Intel® Distribution of OpenVINO™ toolkit, automate anatomical measurements in near real-time and improve workflow efficiency, while maintaining accuracy of the model without the need for discrete GPU investment. *Other names and brands may be claimed as the property of others. Configuration: Based on Siemens Healthineers and Intel Analysis on 2nd Gen Intel® Xeon® Platinum 8280 Processor (28 Cores) with 192GB, DDR4-2933, using Intel® OpenVino™ 2019 R1. HT ON, Turbo ON. CentOS Linux release 7.6.1810, kernel 4.19.5- 1.el7.elrepo.x86_64.Custom topology and dataset (image resolution 288x288). Comparing FP32 vs Int8 w/ Intel® DL Boost performance on the system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Result 5.5xfaster(now201fPS) This Siemens Healthineers’ feature is currently under development and is not currently for sale. Using 2nd Generation Intel® Xeon Scalable with Intel® DL Boost (INT8 precision) compared to FP32 precisionLeft Ventricle, Right Ventricle (short-axis), Left Ventricle, Right Atrium and Left Atrium (long-axis)
  • 22. OVERVIEW DEEP DIVE STEPS TO Strategy Portfolio compute hardware software performance usecases develop partner © 2019 Intel Corporation
  • 23. Image Resnet-50, Inception V3, MobileNet, SqueezeNet R-FCN, Faster-RCNN , Yolo V2, SSD-VGG16, SSD-MobileNet Mask R-CNN Imagerecognition Objectdetection Imagesegmentation GNMT Wide & Deep, NCF Languagetranslation Texttospeech WaveNet MiniGo © 2019 Intel CorporationOther names and brands may be claimed as the property of others.
  • 24. Image Config 1: Nvidia data source: https://developer.nvidia.com/deep-learning-performance-training-inference (Updated March 5th 2019). See Config 24 for detailed Xeon Configuration Config 2: See Config.slide 22 for Xeon and Nvidia configuration Config 3: See Config slide 23 for Xeon and Nvidia configuration Config 4: See Config slide 12 for Xeon and MLPerf configuration ; MLPerf v0.5 Training Closed - Reinforcement benchmark (MiniGo); system employed TensorFlow 1.10.1 with MKLDNN v0.14. Retrieved from www.mlperf.org. MLPerf v0.5 Results December 12th, 2018. MLPerf name and logo are trademarks. See www.mlperf.org for more information. Estimated score of 11.81 on the MLPerf v0.5 Training Closed Division – Reinforcement benchmark (MiniGo). Result not verified by MLPerf. MLPerf name and logo are trademarks. See www.mlperf.org for more information. See Config 12 for detailed Xeon configuration.; 1.87x performance improvement in MLPerf MiniGo performance on 2S Intel® Xeon® Platinum 9282 processor compared to published MLPerf v0.5 MiniGo results on 2S Intel® Xeon® Platinum 8180 processor Performance results are based on testing as of 1/31/2019 and may not reflect all publically available security updates. No product can be absolutely secure. See configuration disclosure for details. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Other names and brands may be claimed as the property of others. Config 1 Config 2 Config 3 Config 4 © 2019 Intel Corporation
  • 25. Image Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Other names and brands may be claimed as the property of others. https://www.intel.com/content/www/us/en/process ors/xeon/scalable/software-solutions/taboola- cloud.html https://www.intel.ai/papers/iflytek-xeonscalable/ https://www.intel.ai/rl-coach-aws-sagemaker- accelerated-deep-rl/ Billions of real time content recommendations Advanced RL models on C5 instances Leading advancements in speech recognition https://newsroom.intel.com/news/new-intel-based- artificial-intelligence-imaging-solution-accelerate- critical-patient-diagnoses/ CT scan radiology medical image classification GE Healthcare © 2019 Intel Corporation
  • 26. OPTIMIZED SOFTWARE FOR THE BEST AI PERFORMANCE github.com/IntelAI/models/tree/m aster/benchmarks nGraph software.intel.com/OpenVINO/toolkit github.com/NervanaSystems/ngraph MKL® DNN DAAL PYTHON software.intel.com/en- us/distribution-for-python software.intel.com/en-us/intel-daal github.com/intel/mkl-dnn © 2019 Intel CorporationOther names and brands may be claimed as the property of others. data types
  • 27. INTEL® DL BOOST ECOSYSTEM SUPPORT OPTIMIZED SW & FRAMEWORKS SOFTWARE VENDORS CLOUD SERVICE PROVIDERS ENTERPRISES VIDEO ANALYSIS TEXT DETECTION MEDICAL IMAGE CLASSIFICATION FACIAL RECOGNITION ML INFERENCING
  • 28. PARTNERS DELL* EMC* READY SOLUTIONS FOR AI INSPUR FPGA SOLUTION FOR AI LENOVO AI OFFERINGS AND MORE DEPLOY AI SOLUTIONS FASTER THROUGH INTEL’S STELLAR ECOSYSTEM LEADING AI SYSTEMS AND SOFTWARE SOLUTIONS >80 AVAILABLE FROM 200+ PARTNERS builders.intel.com/ai ACCELERATING AI INNOVATION ACROSS CRITICAL BUSINESS WORKLOADS software.intel.com/ai © 2019 Intel Corporation
  • 30. © 2019 Intel Corporation
  • 31. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks . Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks . Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, Intel Optane, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others. Intel and the Intel logo are trademarks of Intel Corporation in the United States and other countries. © 2019 Intel Corporation.