Factors to Consider When Choosing Accounts Payable Services Providers.pptx
TRACK F: Improving Utilization of Acceleration Platforms by Using Off-Platform Test Generation/ Arkadiy Morgenshtein
1. May 1, 2013 1
Improving Utilization of Acceleration
Platforms by Using Off-Platform Test
Generation
May 1, 2013
Wisam Kadry, Dmitry Krestyashyn, Arkadiy Morgenshtein,
Amir Nahir, Vitali Sokhin, Elena Tsanko
IBM Research - Haifa
2. May 1, 2013 2
Outline
Introduction
• Functional verification
• Exercisers for Post-Si validation
• Exercisers on Accelerators (EoA)
Threadmill Overview
• Architecture
• Main features
Offline Generation Mode
• Motivation
• Methodology
Results
• Utilization improvement
• Coverage improvement
Conclusions and Future Work
3. May 1, 2013 3
Typical Functional Verification Flow
Test
Template
Coverage
Analysis Tool
Coverage
Information
Random
Stimuli
Generator
Test
Test
Fail
PassDUV
Simulator
Checking,
Assertions
Coverage
Reports
4. May 1, 2013 4
Software
Simulation
Acceleration
Prototyping
Silicon
Speed
ControllabilityandObservability
10 1K 100K 10M 1G
Pre and Post Silicon Tradeoffs
5. May 1, 2013 5
• Run operating-systems and application
– Very limited coverage
– Very little variability
– Hard to debug
• Run test-cases generated by pre-silicon test-generators
– Long generation time implies many servers need to feed one silicon
platform
– Low utilization due to loading time
– Poor solutions for built-in online checking at test level
• Pre-Si checking uses checkers of the simulation platforms, unavailable at Post-Si
• Exercisers
Post Silicon Validation Alternatives
6. May 1, 2013 6
Exercisers: Post Silicon Validation Tools
Exerciser - program that runs on a testing environment (accelerator
or/and silicon) and “exercises” the design by testing interesting
scenarios on it
7. May 1, 2013 7
Exerciser requirements
Include a random stimuli generation component (as in pre-silicon)
Valid stimuli
Adhere to user requests
High quality stimuli
Generate many test-cases from the same test-template
Simple and fast
Can run on early bring-up silicon
Eases debugging
Increases platform utilization
Self-contained
Minimal interaction with the environment
Loaded once on the DUV, runs “forever”
Bare-Metal
Contains OS services required by the test-cases
Enables complete machine control
8. May 1, 2013 8
Threadmill: IBM Post-Silicon Exerciser
Test
Template
System
Configuration
Architectural
Model &
Testing
Knowledge
Generator
&
Kernel
Generation
Checking
Execution
OS services
Test Template
Topology
Architectural
Model
Exerciser Image
Test Template
Topology
Architectural
Model
Test Template
Topology
Architectural
Model
Exerciser Image
Builder
Test
Template
Test
Template
Silicon
Accelerator
Reference
Model
9. May 1, 2013 9
Def language for test-templates:
Rich language to describe the test-plan scenarios
Multi-threaded support (each thread with its own scenario)
Checking:
Multi pass checking: comparing values of architectural resources (GPRs,
SPRs, memory) between different executions of the same test-case
Variability originates from changes to the state of the design
Timing variations in multithreaded processing
Randomization of uArch modes of the processors – thread priority,
internal control modes
Variations in pipeline and cache states
User ability to specify self checking as part of the test-case
Threadmill - Main Features
10. May 1, 2013 10
Generation:
Concurrent multi-threaded generation
Light-weight, on-platform
Static: no reference model and no state tracking
Very fast :100s of tests per second on silicon
Utilization: 90% generation, 10% execution and checking
Threadmill - Main Features
11. May 1, 2013 11
Large number of processors, each of which simulates a small portion of the
design and pass the results between them
Processors running in parallel, allowing high execution performance
Orders of magnitude faster than simulation
Allow good observability and coverage analysis
Allow tests execution of billions of cycles at pre-Si stage
The platform used extensively and simultaneously by multiple projects and
locations
High cost and limited resources dictate request for utilization efficiency
Accelerators
12. May 1, 2013 12
Exercisers on Accelerator
Motivation:
Verification of early design models – more cycles, longer tests than in simulation
Debug at bring-up stage (better observability than Si, higher speed than simulation)
Utilization of failure event checkers, available only on Accelerator
SW validation
Test quality analysis – coverage (count, specific functions hit)
Challenges:
High system cost and limited resource availability dictate a need for utilization
efficiency improvement
Tests ran by the exercisers should target coverage maximization within constrains of
limited resources
Proposed approach – Off-Platform Generation
13. May 1, 2013 13
Threadmill Offline Generation Mode
Execution
Checking
TC1
RES
t0
Generation
TC10
RES
t0
Execution
Checking
TC1
RES
t0
Generation
Execution
Checking
TC1
RES
t0
Generation
Execution
New Image
Checking
TC1
Results
Accelerator
Generation
TC10
Results
Generation
Checking
Execution
OS services
Test Template
Topology
Architectural
Model
Exerciser Image
Test Template
Topology
Architectural
Model
Test Template
Topology
Architectural
Model
Exerciser Image
Test Template
Generator&
Kernel
Builder
Architectural
Model
Reference
ModelConfiguration
14. May 1, 2013 14
Threadmill Offline Generation Mode
• Create image with generator component enabled
– Include empty data structures for the test-cases, memory initializations,
translation tables and expected results
• Run the post-silicon application on a software reference model
• Extract the necessary data of test-cases, memory and results from the run
on a software reference model
– Fill data structures with all the data
• Produce an image that includes all harvested data.
– Disable the generator component
• Load the image to the acceleration platform
• Run the image without the overhead associated with the generation of
test-cases and initializations.
15. May 1, 2013 15
Offline vs. Regular Generation
Pro’s
• No cycles “waste” for on-platform generation
• More test cases can be ran for same number of cycles
• Higher test coverage can be expected
• Comparison with SW reference model may reveal 2+2=5 bugs
Con’s
• Depends on a reference model
• Big-size image loading influences number of test cases
16. May 1, 2013 16
Experimental Setup
• Two example test templates used as benchmarks:
– Random: 100 random instructions
– Directed: some threads perform load/stores; other threads run
functional scenario
• For each test template 3 images were prepared:
– Regular mode
– Offline mode with 50 test-cases
– Offline mode with 100 test-cases
17. May 1, 2013 17
1.35 M1.3 M4.8 MCycles per test-case
10050124Num of test-cases
135 M65 M595 MTotal Accelerator
cycles
44.3 MB23.7 MB3.5 MBImage size
15.8 min8 min0.6 minTime to prepare
image
Offline mode 100 TCOffline mode 50 TCRegular mode
Accelerator utilization improvement: x3.7
Results – Random Test
18. May 1, 2013 18
1.45 M1.4 M7 MCycles per test-case
1005042Num of test-cases
145 M70 M295 MTotal Accelerator
cycles
45.9 MB24.6 MB3.7 MBImage size
17.9 min10.2 min0.7 minTime to prepare
image
Offline mode 100 TCOffline mode 50 TCRegular mode
Accelerator utilization improvement: x5
Results – Directed Test
19. May 1, 2013 19
Coverage Comparison
•About 50,000 coverage events are analyzed in the Accelerator model
•A test of a new special feature of the next Power design was selected for
coverage comparison
• Only events related to the specific functionality were analyzed
• Exerciser code does not use the analyzed feature - less coverage “noise”
•Number of covered events (out of 310 analyzed events):
• Offline – 237
• Regular – 209
•Total count of hits of all events:
• Offline – 117,020
• Regular – 56,708
20. May 1, 2013 20
1
10
100
1000
10000
100000
coverage events
#hits
hitCounter_offline
hitCounter_regular
Coverage Comparison
Events hit only by OfflineOffline achieves more hits
for most events
21. May 1, 2013 21
Conclusions and Future Work
• More TCs – higher chance of triggering various scenarios
• Improved coverage
• Quality assessment of test content that is later used at bring-up
• The Offline generation concept may be used in future as basis for
a dedicated tool for Accelerator-based verification
22. May 1, 2013 22
References
• A. Adir, S. Copty, S. Landa, A. Nahir, G. Shurek, A. Ziv, C. Meissner,
J. Schumann, “A unified methodology for pre-silicon verification and
post-silicon validation” – DATE 2011
• A. Adir, M. Golubev, S. Landa, A. Nahir, G. Shurek, V. Sokhin, A. Ziv,
“Threadmill: A post-silicon exerciser for multi-threaded processors” –
DAC 2011
• A. Adir, A. Nahir, G. Shurek, A. Ziv, C. Meissner, J. Schumann,
“Leveraging pre-silicon verification resources for the post-silicon
validation of the IBM POWER7 processor” – DAC 2011