SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Presented by
Date
Event
Benchmarking Best
Practices 102
Maxim Kuvyrkov
BKK16-300 March 9, 2016
Linaro Connect BKK16
Overview
● Revision (Benchmarking Best Practices 101)
● Reproducibility
● Reporting
Revision
Previously, in Benchmarking-101...
● Approach benchmarking as an experiment.
Be scientific.
● Design the experiment in light of your goal.
● Repeatability:
○ Understand and control noise.
○ Use statistical methods to find truth in noise.
And we briefly mentioned
● Reproducibility
● Reporting
So let’s talk some more about those.
Reproducibility
Reproducibility
An experiment is reproducible if external
teams can run the same experiment over large
periods of time and get commensurate
(comparable) results.
Achieved if others can repeat what we did
and get the same results as us, within the
given confidence interval.
From Repeatability to Reproducibility
We must log enough information that anyone
else can use that information to repeat our
experiments.
We have achieved reproducibility if they can
get the same results, within the given
confidence interval.
Logging: Target
● CPU/SoC/Board
○ Revision, patch level, firmware version…
● Instance of the board
○ Is board 1 really identical to board 2?
● Kernel version and configuration
● Distribution
Example: Target
Board: Juno r0
CPU: 2 * Cortex-A57r0p0, 4 * Cortex-A53r0p0
Firmware version: 0.11.3
Hostname: juno-01
Kernel: 3.16.0-4-generic #1 SMP
Distribution: Debian Jessie
Logging: Build
● Exact toolchain version
● Exact libraries used
● Exact benchmark source
● Build system (scripts, makefiles etc)
● Full build log
Others should be able to acquire and rebuild all
of these components.
Example: Build
Toolchain: Linaro GCC 2015.04
CLI: -O2 -fno-tree-vectorize -DFOO
Libraries: libBar.so.1.3.2, git.linaro.org/foo/bar
#8d30a2c508468bb534bb937bd488b18b8636d3b1
Benchmark: MyBenchmark, git.linaro.org/foo/mb
#d00fb95a1b5dbe3a84fa158df872e1d2c4c49d06
Build System: abe, git.linaro.org/toolchain/abe
#d758ec431131655032bc7de12c0e6f266d9723c2
Logging: Run-time Environment
● Environment variables
● Command-line options passed to benchmark
● Mitigation measures taken
Logging: Other
All of the above may need modification
depending on what is being measured.
● Network-sensitive benchmarks may need
details of network configuration
● IO-sensitive benchmarks may need details
of storage devices
● And so on...
Long Term Storage
All results should be stored with information
required for reproducibility
Results should be kept for the long term
● Someone may ask you for some information
● You may want to do some new analysis in
the future
Reporting
Reporting
● Clear, concise reporting allows others to
utilise benchmark results.
● Does not have to include all data required for
reproducibility.
● But that data should be available.
● Do not assume too much reader knowledge.
○ Err on the side of over-explanation
Reporting: Goal
Explain the goal of the experiment
● What decision will it help you to make?
● What improvement will it allow you to
deliver?
Explain the question that the experiment asks
Explain how the answer to that question helps
you to achieve the goal
Reporting
● Method: Sufficient high-level detail
○ Target, toolchain, build options, source, mitigation
● Limitations: Acknowledge and justify
○ What are the consequences for this experiment?
● Results: Discuss in context of goal
○ Co-locate data, graphs, discussion
○ Include units - numbers without units are useless
○ Include statistical data
○ Use the benchmark’s metrics
Presentation of Results
Graphs are always useful
Tables of raw data also useful
Statistical context essential:
● Number of runs
● (Which) mean
● Standard deviation
Experimental Conditions
Precisely what to report depends on what is
relevant to the results
The following are guidelines
Of course, all the environmental data should be
logged and therefore available on request
Include
Highlight key information, even if it could be
derived. Including:
● All toolchain options
● Noise mitigation measures
● Testing domain
● For e.g. memory sensitive benchmark, report
bus speed, cache hierarchy
Leave Out
Everything not essential to the main point
● Environment variables
● Build logs
● Firmware
● ...
All of this information should be available to be
provided on request.
Graphs:
Strong Suggestions
Speedup Over Baseline (1/3)
Misleading scale
● A is about 3.5%
faster than it was
before, not 103.5%
Obfuscated regression
● B is a regression
Speedup Over Baseline (2/3)
Baseline becomes 0
Title now correct
Regression clear
But, no confidence
interval.
Speedup Over Baseline (3/3)
Error bars tell us more
● Effect on D can be
disregarded
● Effect on A is real,
but noisy.
Labelling (1/2)
What is the unit?
What are we comparing?
Labelling (2/2)
Graphs:
Weak Suggestions
Show the mean
Direction of ‘Good’ (1/2)
“Speedup” changes to
“time to execute”
Direction of “good” flips
If possible, maintain a
constant direction of
good.
Direction of ‘Good’ (2/2)
If you have to change
the direction of ‘good’,
flag the direction
(everywhere)
Can be helpful to flag
it anyway
Consistent Order
Presents
improvements neatly
But, hard to compare
different graphs in the
same report
Scale (1/2)
A few high scores make
other results hard to see
A couple of alternatives
may be more clear...
Scale (2/2)
Summary
Summary
● Log everything, in detail
● Be clear about:
○ What the goal of your experiment is
○ What your method is, and how it achieves your
purpose
● Present results
○ Unambiguously
○ With statistical context
● Relate results to your goal

Más contenido relacionado

La actualidad más candente

BKK16-210 Migrating to the new dispatcher
BKK16-210 Migrating to the new dispatcherBKK16-210 Migrating to the new dispatcher
BKK16-210 Migrating to the new dispatcherLinaro
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64Linaro
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
 
BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction Linaro
 
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)Linaro
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmlinuxlab_conf
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldScyllaDB
 
Stress driven development
Stress driven developmentStress driven development
Stress driven developmentmitesh_sharma
 
RR and priority scheduling
RR and priority schedulingRR and priority scheduling
RR and priority schedulingA. S. M. Shafi
 
Round robin scheduling
Round robin schedulingRound robin scheduling
Round robin schedulingRaghav S
 
101 3.6 modify process execution priorities
101 3.6 modify process execution priorities101 3.6 modify process execution priorities
101 3.6 modify process execution prioritiesAcácio Oliveira
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorderJean-Philippe BEMPEL
 
BUD17-214: Bus scaling QoS update
BUD17-214: Bus scaling QoS update BUD17-214: Bus scaling QoS update
BUD17-214: Bus scaling QoS update Linaro
 
Introducing TDD
Introducing TDDIntroducing TDD
Introducing TDDVlad Balan
 
BUD17-TR02: Upstreaming 101
BUD17-TR02: Upstreaming 101 BUD17-TR02: Upstreaming 101
BUD17-TR02: Upstreaming 101 Linaro
 
BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr Linaro
 

La actualidad más candente (20)

BKK16-210 Migrating to the new dispatcher
BKK16-210 Migrating to the new dispatcherBKK16-210 Migrating to the new dispatcher
BKK16-210 Migrating to the new dispatcher
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
BUD17-309: IRQ prediction
BUD17-309: IRQ prediction BUD17-309: IRQ prediction
BUD17-309: IRQ prediction
 
Homework solutionsch9
Homework solutionsch9Homework solutionsch9
Homework solutionsch9
 
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
LCE13: Test and Validation Summit: Evolution of Testing in Linaro (I)
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
 
Stress driven development
Stress driven developmentStress driven development
Stress driven development
 
RTX Kernal
RTX KernalRTX Kernal
RTX Kernal
 
RR and priority scheduling
RR and priority schedulingRR and priority scheduling
RR and priority scheduling
 
Round robin scheduling
Round robin schedulingRound robin scheduling
Round robin scheduling
 
101 3.6 modify process execution priorities
101 3.6 modify process execution priorities101 3.6 modify process execution priorities
101 3.6 modify process execution priorities
 
Homework solution1
Homework solution1Homework solution1
Homework solution1
 
Tools in action jdk mission control and flight recorder
Tools in action  jdk mission control and flight recorderTools in action  jdk mission control and flight recorder
Tools in action jdk mission control and flight recorder
 
BUD17-214: Bus scaling QoS update
BUD17-214: Bus scaling QoS update BUD17-214: Bus scaling QoS update
BUD17-214: Bus scaling QoS update
 
Introducing TDD
Introducing TDDIntroducing TDD
Introducing TDD
 
Automation for developers
Automation for developersAutomation for developers
Automation for developers
 
BUD17-TR02: Upstreaming 101
BUD17-TR02: Upstreaming 101 BUD17-TR02: Upstreaming 101
BUD17-TR02: Upstreaming 101
 
BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr BUD17-405: Building a reference IoT product with Zephyr
BUD17-405: Building a reference IoT product with Zephyr
 

Similar a BKK16-300 Benchmarking 102

SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101Linaro
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEYelp Engineering
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFMLconf
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitskylopanath
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Omid Vahdaty
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scaleOwen Zhang
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
[SRD UGM] Sharing Session - Software Testing
[SRD UGM] Sharing Session - Software Testing[SRD UGM] Sharing Session - Software Testing
[SRD UGM] Sharing Session - Software TestingAhmad Widardi
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptxMarkBauer47
 

Similar a BKK16-300 Benchmarking 102 (20)

SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101SFO15-301: Benchmarking Best Practices 101
SFO15-301: Benchmarking Best Practices 101
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitsky
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
HW03 (1).pdf
HW03 (1).pdfHW03 (1).pdf
HW03 (1).pdf
 
Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...Lessons learned from designing QA automation event streaming platform(IoT big...
Lessons learned from designing QA automation event streaming platform(IoT big...
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
C3 w3
C3 w3C3 w3
C3 w3
 
[SRD UGM] Sharing Session - Software Testing
[SRD UGM] Sharing Session - Software Testing[SRD UGM] Sharing Session - Software Testing
[SRD UGM] Sharing Session - Software Testing
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
 

Más de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Más de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

BKK16-300 Benchmarking 102

  • 1. Presented by Date Event Benchmarking Best Practices 102 Maxim Kuvyrkov BKK16-300 March 9, 2016 Linaro Connect BKK16
  • 2. Overview ● Revision (Benchmarking Best Practices 101) ● Reproducibility ● Reporting
  • 4. Previously, in Benchmarking-101... ● Approach benchmarking as an experiment. Be scientific. ● Design the experiment in light of your goal. ● Repeatability: ○ Understand and control noise. ○ Use statistical methods to find truth in noise.
  • 5. And we briefly mentioned ● Reproducibility ● Reporting So let’s talk some more about those.
  • 7. Reproducibility An experiment is reproducible if external teams can run the same experiment over large periods of time and get commensurate (comparable) results. Achieved if others can repeat what we did and get the same results as us, within the given confidence interval.
  • 8. From Repeatability to Reproducibility We must log enough information that anyone else can use that information to repeat our experiments. We have achieved reproducibility if they can get the same results, within the given confidence interval.
  • 9. Logging: Target ● CPU/SoC/Board ○ Revision, patch level, firmware version… ● Instance of the board ○ Is board 1 really identical to board 2? ● Kernel version and configuration ● Distribution
  • 10. Example: Target Board: Juno r0 CPU: 2 * Cortex-A57r0p0, 4 * Cortex-A53r0p0 Firmware version: 0.11.3 Hostname: juno-01 Kernel: 3.16.0-4-generic #1 SMP Distribution: Debian Jessie
  • 11. Logging: Build ● Exact toolchain version ● Exact libraries used ● Exact benchmark source ● Build system (scripts, makefiles etc) ● Full build log Others should be able to acquire and rebuild all of these components.
  • 12. Example: Build Toolchain: Linaro GCC 2015.04 CLI: -O2 -fno-tree-vectorize -DFOO Libraries: libBar.so.1.3.2, git.linaro.org/foo/bar #8d30a2c508468bb534bb937bd488b18b8636d3b1 Benchmark: MyBenchmark, git.linaro.org/foo/mb #d00fb95a1b5dbe3a84fa158df872e1d2c4c49d06 Build System: abe, git.linaro.org/toolchain/abe #d758ec431131655032bc7de12c0e6f266d9723c2
  • 13. Logging: Run-time Environment ● Environment variables ● Command-line options passed to benchmark ● Mitigation measures taken
  • 14. Logging: Other All of the above may need modification depending on what is being measured. ● Network-sensitive benchmarks may need details of network configuration ● IO-sensitive benchmarks may need details of storage devices ● And so on...
  • 15. Long Term Storage All results should be stored with information required for reproducibility Results should be kept for the long term ● Someone may ask you for some information ● You may want to do some new analysis in the future
  • 17. Reporting ● Clear, concise reporting allows others to utilise benchmark results. ● Does not have to include all data required for reproducibility. ● But that data should be available. ● Do not assume too much reader knowledge. ○ Err on the side of over-explanation
  • 18. Reporting: Goal Explain the goal of the experiment ● What decision will it help you to make? ● What improvement will it allow you to deliver? Explain the question that the experiment asks Explain how the answer to that question helps you to achieve the goal
  • 19. Reporting ● Method: Sufficient high-level detail ○ Target, toolchain, build options, source, mitigation ● Limitations: Acknowledge and justify ○ What are the consequences for this experiment? ● Results: Discuss in context of goal ○ Co-locate data, graphs, discussion ○ Include units - numbers without units are useless ○ Include statistical data ○ Use the benchmark’s metrics
  • 20. Presentation of Results Graphs are always useful Tables of raw data also useful Statistical context essential: ● Number of runs ● (Which) mean ● Standard deviation
  • 21. Experimental Conditions Precisely what to report depends on what is relevant to the results The following are guidelines Of course, all the environmental data should be logged and therefore available on request
  • 22. Include Highlight key information, even if it could be derived. Including: ● All toolchain options ● Noise mitigation measures ● Testing domain ● For e.g. memory sensitive benchmark, report bus speed, cache hierarchy
  • 23. Leave Out Everything not essential to the main point ● Environment variables ● Build logs ● Firmware ● ... All of this information should be available to be provided on request.
  • 25. Speedup Over Baseline (1/3) Misleading scale ● A is about 3.5% faster than it was before, not 103.5% Obfuscated regression ● B is a regression
  • 26. Speedup Over Baseline (2/3) Baseline becomes 0 Title now correct Regression clear But, no confidence interval.
  • 27. Speedup Over Baseline (3/3) Error bars tell us more ● Effect on D can be disregarded ● Effect on A is real, but noisy.
  • 28. Labelling (1/2) What is the unit? What are we comparing?
  • 32. Direction of ‘Good’ (1/2) “Speedup” changes to “time to execute” Direction of “good” flips If possible, maintain a constant direction of good.
  • 33. Direction of ‘Good’ (2/2) If you have to change the direction of ‘good’, flag the direction (everywhere) Can be helpful to flag it anyway
  • 34. Consistent Order Presents improvements neatly But, hard to compare different graphs in the same report
  • 35. Scale (1/2) A few high scores make other results hard to see A couple of alternatives may be more clear...
  • 38. Summary ● Log everything, in detail ● Be clear about: ○ What the goal of your experiment is ○ What your method is, and how it achieves your purpose ● Present results ○ Unambiguously ○ With statistical context ● Relate results to your goal