he presentation will continue from where Benchmarking Best Practices 101 left off, covering the topics of benchmark reproducibility and reporting in more depth.
An experiment is reproducible if external teams can run the same experiment over large periods of time and get comparable results. Clear, concise reporting allows others to utilise benchmark results.
4. Previously, in Benchmarking-101...
● Approach benchmarking as an experiment.
Be scientific.
● Design the experiment in light of your goal.
● Repeatability:
○ Understand and control noise.
○ Use statistical methods to find truth in noise.
5. And we briefly mentioned
● Reproducibility
● Reporting
So let’s talk some more about those.
7. Reproducibility
An experiment is reproducible if external
teams can run the same experiment over large
periods of time and get commensurate
(comparable) results.
Achieved if others can repeat what we did
and get the same results as us, within the
given confidence interval.
8. From Repeatability to Reproducibility
We must log enough information that anyone
else can use that information to repeat our
experiments.
We have achieved reproducibility if they can
get the same results, within the given
confidence interval.
9. Logging: Target
● CPU/SoC/Board
○ Revision, patch level, firmware version…
● Instance of the board
○ Is board 1 really identical to board 2?
● Kernel version and configuration
● Distribution
11. Logging: Build
● Exact toolchain version
● Exact libraries used
● Exact benchmark source
● Build system (scripts, makefiles etc)
● Full build log
Others should be able to acquire and rebuild all
of these components.
13. Logging: Run-time Environment
● Environment variables
● Command-line options passed to benchmark
● Mitigation measures taken
14. Logging: Other
All of the above may need modification
depending on what is being measured.
● Network-sensitive benchmarks may need
details of network configuration
● IO-sensitive benchmarks may need details
of storage devices
● And so on...
15. Long Term Storage
All results should be stored with information
required for reproducibility
Results should be kept for the long term
● Someone may ask you for some information
● You may want to do some new analysis in
the future
17. Reporting
● Clear, concise reporting allows others to
utilise benchmark results.
● Does not have to include all data required for
reproducibility.
● But that data should be available.
● Do not assume too much reader knowledge.
○ Err on the side of over-explanation
18. Reporting: Goal
Explain the goal of the experiment
● What decision will it help you to make?
● What improvement will it allow you to
deliver?
Explain the question that the experiment asks
Explain how the answer to that question helps
you to achieve the goal
19. Reporting
● Method: Sufficient high-level detail
○ Target, toolchain, build options, source, mitigation
● Limitations: Acknowledge and justify
○ What are the consequences for this experiment?
● Results: Discuss in context of goal
○ Co-locate data, graphs, discussion
○ Include units - numbers without units are useless
○ Include statistical data
○ Use the benchmark’s metrics
20. Presentation of Results
Graphs are always useful
Tables of raw data also useful
Statistical context essential:
● Number of runs
● (Which) mean
● Standard deviation
21. Experimental Conditions
Precisely what to report depends on what is
relevant to the results
The following are guidelines
Of course, all the environmental data should be
logged and therefore available on request
22. Include
Highlight key information, even if it could be
derived. Including:
● All toolchain options
● Noise mitigation measures
● Testing domain
● For e.g. memory sensitive benchmark, report
bus speed, cache hierarchy
23. Leave Out
Everything not essential to the main point
● Environment variables
● Build logs
● Firmware
● ...
All of this information should be available to be
provided on request.
38. Summary
● Log everything, in detail
● Be clear about:
○ What the goal of your experiment is
○ What your method is, and how it achieves your
purpose
● Present results
○ Unambiguously
○ With statistical context
● Relate results to your goal