SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Four Years of
           Quantum Simulation

                       Insights and Lessons Learned
                         Verifying the QoS Engine of
                    an Innovative Network Processor


2/3/2009    Copyright © 2009, Achilles Test Systems, Inc.
About the Author
     • Co-Founder: EDA Software (2006-present)
            – DV Notebook Series, Achilles Test Systems
     • Technical Leader: Arch, Micro-Arch, & Verification
            – Packet Scheduler of Quantum Flow Processor, Cisco
     • Principal HW Engineer: RTL
            – Packet Scheduler on forwarding ASIC, Hammerhead
     • Principal HW Engineer: Architectural modeling & RTL
            – Bandwidth Aware Memory Controller, C-port
     • Senior SW Engineer: EDA Software
            – RTL Compilation, ASIC Emulator, Meta Systems
     • Senior HW Engineer: RTL & Embedded Software
            – Ethernet and ATM Switching, UB Networks
 2/3/2009              Copyright © 2009, Achilles Test Systems, Inc.   2
Lessons Apply to Many SoCs

 • Features, complexity, ship date
     –   Innovative enough to represent some risk
     –   Leap-frog current (next) generation
     –   Tested at many layers of abstraction: unit, chip, architectural
     –   Diverse IP blocks

 • Typical SoC team demographics
     – Multi-site, Multi-discipline, Multi-group

 • Challenges
     – How to leverage corporate resources
     – How to track, prioritize, and close issues
     – How to maintain momentum and focus
  2/3/2009              Copyright © 2009, Achilles Test Systems, Inc.      3
Take-home Message

• Use more CPUs, not more bodies
   –   100’s of CPU years of simulation
   –   1000’s of semi-directed test cases
   –   1M’s of test executions, stored results
   –   Methodical closure of issues and exceptions

• Requires investment in environment
   – Increase productivity by 2 orders of magnitude
   – Find ways to keep the work load manageable
   – This talk presents a few ideas…




  2/3/2009            Copyright © 2009, Achilles Test Systems, Inc.   4
Related DV Club Presentations

 • Kersten Eder: University of Bristol
      – Genetic Programming in Automated Test Code Generation
        for a Multi-Treaded Microprocessor
      – DV Club: Bristol – Q4 2008
 • Wilson Snyder: SiCortex
      – Test E.R. Our hope to triage a million tests a day
      – DV Club: Boston – Q4 2007
 • Don Steiss: Cisco
      – Layered Self-checking Test Generation
      – DV Club: Dallas - Q2 2007
 • Shahram Salamian: Intel
      – CPU Verification Metrics
      – DV Club: Austin – Q2 2006

 2/3/2009            Copyright © 2009, Achilles Test Systems, Inc.   5
Hardware Testing Schedules
                       Arch.
                      Testing

                                         RTL Testing

                                                                   QA Lab
                                                                   Testing
Theory:
Reality:                        $$$ SoC           $ FPGA
             Arch.                                              Arch.
            Testing                                            Testing

            RTL Testing                                                   RTL Testing


                  Emulation     QA Lab                                     QA Lab Testing
                                Testing


 2/3/2009                 Copyright © 2009, Achilles Test Systems, Inc.                     6
Planning for First-Pass Silicon

 • Assume a typical SoC flow
      – New architecture, next-gen features
      – Mitigate risk, build confidence
                                                                        Arch.             SoC
                                                                       Testing
 • Leverage corporate resources
                                                                       RTL Testing
      – How many CPUs? (5, 50, 500, more)
      – Assume 1-2 hours per test                                                    QA Lab
                                                                                     Testing
      – How many test runs prior to tape-out?
            • 10 CPUs, 120 runs/day, 44K runs per year
            • 200 CPUs, 2400 runs/day, 876K runs per year

 • Constraints / Obstacles
      – Small team, often new to problem domain
      – No pre-existing tests for next-gen features
      – Millions of cycles needed per test

 2/3/2009              Copyright © 2009, Achilles Test Systems, Inc.                      7
Getting Help on Test Creation

• End-user test descriptions
   – e.g. Simplified Command Line Interface (CLI)
   – Architects, marketing, FAEs, SW, and QA contribute to testing
   – Jump starts testing with 50-100 important configurations

• Have S/W team review testing configurations
   – Architectural simulation pre-dates register definition
   – Create a Config API, higher level than registers
                                                                        ASIC Testing OR
• Lessons learned on past projects…                                     Platform Software
   – Simple CLI is not good for all tasks
         • Hard to create certain low-level cases                           Config API
         • Missing some chassis-level details                           reg-R/W      C
   – Rigid implementation of test language
         • Lex and Yacc require expert S/W design                       RTL OR Ref Model
                                                                         ASIC
  2/3/2009              Copyright © 2009, Achilles Test Systems, Inc.                  8
Leveraging a High Speed Model

 • Emulation: Ideal
      – Fast and cycle accurate, but expensive

 • Otherwise: Fast simulation                                   Performance
      –     C model for requirements testing                    > 1MHz
      –     C & RTL cosim for comparison                        < 1KHz
      –     Discrete-event when possible                        T=f(pps)
      –     Cycle-accurate when necessary                       T=f(Hz)

 • Look for transitive correctness
      – IF           (C satisfies requirements)                            C
      – AND          (C==RTL)
      – THEN         RTL satisfies requirements
                                                                  RTL

 2/3/2009               Copyright © 2009, Achilles Test Systems, Inc.          9
Create a Balanced Testing Pipeline


            Automatic           CPU Farm                  Results
            Test                Intelligent               Scoring &
            Generation          Launch                    Browsing

                1                      2                         3
        ~100 CPUs : ~1000 Tests a day (~ 85% existing,~ 15% new)


                            Detect, Debug, Diagn
                                                 ose,
                            Report, Repair, Rerun
   Assume 6-8 tests                                                   Keep team busy
  debugged per day                                                    without drowning
                                                                      them


 2/3/2009             Copyright © 2009, Achilles Test Systems, Inc.                10
Need a Game-Changing Technology


 • Consider using an SQL database
      – Store test results in DB
            • Track failures, history, fixes, complement bug DB
      – Store test lists in DB
            • Tune run lists to maximize value
      – Store actual tests in DB
            • Treat a test as a data structure


 • But…
      – Hand-coded C↔SQL interface can be rigid
            • Impedes the creation of diverse test types and result formats
      – Need a light-weight way to upload any test description
 2/3/2009               Copyright © 2009, Achilles Test Systems, Inc.   1 2 3   11
Automated Test Generation

 • Plan the test population
      – A certain number of constrained-random tests
      – e.g. 1,000 - 10,000 directed configs
             10 - 100 input scenarios per config
             • Yields 10K-100K unique tests

 • Upload directed tests to DB
      – Small programs perform parameter sweeps
      – Heuristic coverage mutations of bin-neighbors

 • Lesson learned by analyzing generated tests…
      –     Found some unintended common factors
      –     Snow-ball effects: too much in-breeding
      –     Try to limit the influence of any one test
      –     Only possible when functional coverage is persistent

 2/3/2009                Copyright © 2009, Achilles Test Systems, Inc.   1 2 3   12
Test Generation w/ Feedback

 • Testing for numeric error vs. pass/fail
      – Performance (QoS) divergence from a theoretical model
      – Also useful in Analog/Mixed-Signal electrical compliance

 • Optimization: Particle Swarms or Genetic Algorithms
      –     Apply stimulus
      –     Measure error in test result
      –     Create modified tests to amplify result
      –     Only possible if both test and result are data

 • But…
      – Optimizations only amplify problems
      – They do not provide coverage closure
      – Most effective when used with coverage sweeps

 2/3/2009                Copyright © 2009, Achilles Test Systems, Inc.   1 2 3   13
Regression Launch Functions

• Linux farm management
   – Simple user-managed queuing
   – Lighten the load on corporate job queuing                        DB           CPU
                                                                                   Farm
   – Custom queuing policies: e.g. time-of-day

• Rerunning <test, seed> pairs
   – Every failing <test, seed> pair is a mini bug report
   – Aggressive re-run of failing <test, seed> pairs
   – Automatically isolate the check-in that causes a new failure
        • n-ary search using the same test and seed
                                             If only I had
• Interesting challenges                   run a regression!
   – Fairness and perception in a linux-farm
   – C-only simulations vs. licensed apps
 2/3/2009             Copyright © 2009, Achilles Test Systems, Inc.        1 2 3   14
Coverage Tables and Test Lists

• Functional coverage tables
   – Capture coverage assertions from all model types
   – Assertions in C & RTL, special regs in FPGAs

• When results, coverage, & tests list are data…
   –   compute expected time per test list
   –   compute expected coverage per test list
   –   optimize out tests that don’t add coverage
   –   swap out one test for a newer tests w/ same coverage
   –   select groups tests to run for any reason
   –   rotate among test lists w/ similar coverage

• But…
   – Creating test lists should not require SQL knowledge
   – Coverage tables grow large, performance can suffer

 2/3/2009            Copyright © 2009, Achilles Test Systems, Inc.   1 2 3   15
Checking and Result Management

• To monitor output from 1000 test a day...
   – Create automated results checkers
   – Flag feature and performance issues      (end of simulation)
   – Flag C/RTL mismatch or assertion failure (within 10 cycles)

• But…
   – Limitations of a checker can impose limits on test stimulus
   – Test harness can become too narrowly focused

• Lessons learned
   – Need to create diverse drivers and checkers
   – Not every stimulus should pass every checker
   – Emphasis on flexibility, agility

  2/3/2009           Copyright © 2009, Achilles Test Systems, Inc.   1 2 3   16
Leverage (the Right) Reusable Pieces

 • Open source web packages
     –   Sort results by error type and severity
     –   Drill down: result, history, coverage, code version
     –   Small status pages for each regression
     –   Display useful pre-made queries/graphs via web site

 • Create web/DB mirrors for each model type
     – Models: C, Cosim, and RTL-only
     – Results and coverage per model type
     – Higher level management views for group status

 • Lessons learned…
     – Hand coded CGI can be laborious
     – Other teams could not directly re-use our work
     – Should not require users to know SQL
  2/3/2009             Copyright © 2009, Achilles Test Systems, Inc.   1 2 3   17
Lessons Learned

• Drive to SoC tape-out with high confidence
   –   Methodical closure of feature issues and exceptions
   –   Exceed 10K tests & 100K captured results per engineer
   –   Over 1M CPU hours of simulation (125 CPU years)
   –   Keep team fed with work and information
   –   Small team, Very high productivity

• Would definitely do it again, but…
   – Make it easy enough to be useful on smaller projects
        • 1-5 engineers, 5-10 sims per day
   –   Leverage recent Web 2.0 advances
   –   Canned database-backed platforms
   –   Wikis
   –   Social collaboration and tagging

 2/3/2009             Copyright © 2009, Achilles Test Systems, Inc.   18
Thank You

                                   Questions?
                chris.kappler@achillestest.com


2/3/2009   Copyright © 2009, Achilles Test Systems, Inc.

Más contenido relacionado

La actualidad más candente

ANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesAnsys
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016Tim Ellison
 
Being Agile with Scrum - koders.co
Being Agile with Scrum - koders.coBeing Agile with Scrum - koders.co
Being Agile with Scrum - koders.coEnder Aydin Orak
 
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...scalaconfjp
 
Rethinking Testing
Rethinking TestingRethinking Testing
Rethinking Testingpdejuan
 
Implementing Electrical and Simulation Rule Checks to ensure Signal Quality
Implementing Electrical and Simulation Rule Checks to ensure Signal QualityImplementing Electrical and Simulation Rule Checks to ensure Signal Quality
Implementing Electrical and Simulation Rule Checks to ensure Signal QualityEMA Design Automation
 
Respond flow chart (rfc)
Respond flow chart (rfc)Respond flow chart (rfc)
Respond flow chart (rfc)Agus Triyanto
 
Comprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionComprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionTechWell
 
Triad Semiconductor Analog and Mixed Signal ASIC Company Overview
Triad Semiconductor Analog and Mixed Signal ASIC Company OverviewTriad Semiconductor Analog and Mixed Signal ASIC Company Overview
Triad Semiconductor Analog and Mixed Signal ASIC Company OverviewTriad Semiconductor
 
In Sync Running Apps On Oracle
In Sync  Running Apps On OracleIn Sync  Running Apps On Oracle
In Sync Running Apps On OracleInSync Conference
 
WALA Tutorial at PLDI 2010
WALA Tutorial at PLDI 2010WALA Tutorial at PLDI 2010
WALA Tutorial at PLDI 2010Julian Dolby
 
Reliability Testing in OPNFV
Reliability Testing in OPNFVReliability Testing in OPNFV
Reliability Testing in OPNFVOPNFV
 
Bringing Engineering Analysis Codes Into Real-Time Full-Scope Simulators
Bringing Engineering Analysis Codes Into Real-Time Full-Scope SimulatorsBringing Engineering Analysis Codes Into Real-Time Full-Scope Simulators
Bringing Engineering Analysis Codes Into Real-Time Full-Scope SimulatorsGSE Systems, Inc.
 
Combining requirements engineering and testing in agile.
Combining requirements engineering and testing in agile. Combining requirements engineering and testing in agile.
Combining requirements engineering and testing in agile. SYSQA BV
 

La actualidad más candente (20)

ANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Being Agile with Scrum - koders.co
Being Agile with Scrum - koders.coBeing Agile with Scrum - koders.co
Being Agile with Scrum - koders.co
 
Strickland dvclub
Strickland dvclubStrickland dvclub
Strickland dvclub
 
Jai kumar fpga_prototyping
Jai kumar fpga_prototypingJai kumar fpga_prototyping
Jai kumar fpga_prototyping
 
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
 
PowerDRC/LVS 2.0 Overview
PowerDRC/LVS 2.0 OverviewPowerDRC/LVS 2.0 Overview
PowerDRC/LVS 2.0 Overview
 
JedaOverview
JedaOverviewJedaOverview
JedaOverview
 
Rethinking Testing
Rethinking TestingRethinking Testing
Rethinking Testing
 
Implementing Electrical and Simulation Rule Checks to ensure Signal Quality
Implementing Electrical and Simulation Rule Checks to ensure Signal QualityImplementing Electrical and Simulation Rule Checks to ensure Signal Quality
Implementing Electrical and Simulation Rule Checks to ensure Signal Quality
 
Respond flow chart (rfc)
Respond flow chart (rfc)Respond flow chart (rfc)
Respond flow chart (rfc)
 
Comprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live ProductionComprehensive Performance Testing: From Early Dev to Live Production
Comprehensive Performance Testing: From Early Dev to Live Production
 
Triad Semiconductor Analog and Mixed Signal ASIC Company Overview
Triad Semiconductor Analog and Mixed Signal ASIC Company OverviewTriad Semiconductor Analog and Mixed Signal ASIC Company Overview
Triad Semiconductor Analog and Mixed Signal ASIC Company Overview
 
In Sync Running Apps On Oracle
In Sync  Running Apps On OracleIn Sync  Running Apps On Oracle
In Sync Running Apps On Oracle
 
WALA Tutorial at PLDI 2010
WALA Tutorial at PLDI 2010WALA Tutorial at PLDI 2010
WALA Tutorial at PLDI 2010
 
DevOps Tooling event Amazic
DevOps Tooling event AmazicDevOps Tooling event Amazic
DevOps Tooling event Amazic
 
Kishore ems resume
Kishore ems resumeKishore ems resume
Kishore ems resume
 
Reliability Testing in OPNFV
Reliability Testing in OPNFVReliability Testing in OPNFV
Reliability Testing in OPNFV
 
Bringing Engineering Analysis Codes Into Real-Time Full-Scope Simulators
Bringing Engineering Analysis Codes Into Real-Time Full-Scope SimulatorsBringing Engineering Analysis Codes Into Real-Time Full-Scope Simulators
Bringing Engineering Analysis Codes Into Real-Time Full-Scope Simulators
 
Combining requirements engineering and testing in agile.
Combining requirements engineering and testing in agile. Combining requirements engineering and testing in agile.
Combining requirements engineering and testing in agile.
 

Destacado (7)

Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
The validation attitude
The validation attitudeThe validation attitude
The validation attitude
 
Arthur q207
Arthur q207Arthur q207
Arthur q207
 
Herrington dv club_sept19-1
Herrington dv club_sept19-1Herrington dv club_sept19-1
Herrington dv club_sept19-1
 
Mintz q207
Mintz q207Mintz q207
Mintz q207
 
Lafauci dv club oct 2006
Lafauci dv club oct 2006Lafauci dv club oct 2006
Lafauci dv club oct 2006
 
Roy omap validation_dvc_lub_092106
Roy omap validation_dvc_lub_092106Roy omap validation_dvc_lub_092106
Roy omap validation_dvc_lub_092106
 

Similar a Boston 2009 q1_kappler_chris

Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceDVClub
 
Public vs. Private Cloud Performance by Flex
Public vs. Private Cloud Performance by FlexPublic vs. Private Cloud Performance by Flex
Public vs. Private Cloud Performance by FlexStackIQ
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Finaljucaab
 
Learning on Deep Learning
Learning on Deep LearningLearning on Deep Learning
Learning on Deep LearningShelley Lambert
 
Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3Juan Herrera Utande
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale NetApp
 
Dealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in VerificationDealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in VerificationDVClub
 
Technical Lessons Learned Turning the Agile Dials to Eleven!
Technical Lessons Learned Turning the Agile Dials to Eleven!Technical Lessons Learned Turning the Agile Dials to Eleven!
Technical Lessons Learned Turning the Agile Dials to Eleven!Craig Smith
 
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Robb Boyd
 
Cray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge SimulationsCray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge Simulationsinside-BigData.com
 
Introduction to tempest
Introduction to tempest Introduction to tempest
Introduction to tempest openstackindia
 
Designing for Testability - Rohit Nayak
Designing for Testability - Rohit NayakDesigning for Testability - Rohit Nayak
Designing for Testability - Rohit NayakIndicThreads
 
Architecture for Massively Parallel HDL Simulations
Architecture for Massively Parallel HDL Simulations Architecture for Massively Parallel HDL Simulations
Architecture for Massively Parallel HDL Simulations DVClub
 
The_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_CouldThe_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_CouldShelley Lambert
 
Web Application Release
Web Application ReleaseWeb Application Release
Web Application ReleasePiyush Mattoo
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Sunghyouk Bae
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)Amazon Web Services
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 

Similar a Boston 2009 q1_kappler_chris (20)

Intel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification ExperienceIntel Atom Processor Pre-Silicon Verification Experience
Intel Atom Processor Pre-Silicon Verification Experience
 
Public vs. Private Cloud Performance by Flex
Public vs. Private Cloud Performance by FlexPublic vs. Private Cloud Performance by Flex
Public vs. Private Cloud Performance by Flex
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Planning open stack-poc
Planning open stack-pocPlanning open stack-poc
Planning open stack-poc
 
Learning on Deep Learning
Learning on Deep LearningLearning on Deep Learning
Learning on Deep Learning
 
SQL Server in the AWS Cloud
SQL Server in the AWS CloudSQL Server in the AWS Cloud
SQL Server in the AWS Cloud
 
Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
 
Dealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in VerificationDealing with the Three Horrible Problems in Verification
Dealing with the Three Horrible Problems in Verification
 
Technical Lessons Learned Turning the Agile Dials to Eleven!
Technical Lessons Learned Turning the Agile Dials to Eleven!Technical Lessons Learned Turning the Agile Dials to Eleven!
Technical Lessons Learned Turning the Agile Dials to Eleven!
 
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
Simulating Networks Using Cisco Modeling Labs (TechWiseTV Workshop)
 
Cray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge SimulationsCray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge Simulations
 
Introduction to tempest
Introduction to tempest Introduction to tempest
Introduction to tempest
 
Designing for Testability - Rohit Nayak
Designing for Testability - Rohit NayakDesigning for Testability - Rohit Nayak
Designing for Testability - Rohit Nayak
 
Architecture for Massively Parallel HDL Simulations
Architecture for Massively Parallel HDL Simulations Architecture for Massively Parallel HDL Simulations
Architecture for Massively Parallel HDL Simulations
 
The_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_CouldThe_Little_Jenkinsfile_That_Could
The_Little_Jenkinsfile_That_Could
 
Web Application Release
Web Application ReleaseWeb Application Release
Web Application Release
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 

Más de Obsidian Software (20)

Yang greenstein part_2
Yang greenstein part_2Yang greenstein part_2
Yang greenstein part_2
 
Yang greenstein part_1
Yang greenstein part_1Yang greenstein part_1
Yang greenstein part_1
 
Williamson arm validation metrics
Williamson arm validation metricsWilliamson arm validation metrics
Williamson arm validation metrics
 
Whipp q3 2008_sv
Whipp q3 2008_svWhipp q3 2008_sv
Whipp q3 2008_sv
 
Vishakantaiah validating
Vishakantaiah validatingVishakantaiah validating
Vishakantaiah validating
 
Tobin verification isglobal
Tobin verification isglobalTobin verification isglobal
Tobin verification isglobal
 
Tierney bq207
Tierney bq207Tierney bq207
Tierney bq207
 
Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
Shultz dallas q108
Shultz dallas q108Shultz dallas q108
Shultz dallas q108
 
Shreeve dv club_ams
Shreeve dv club_amsShreeve dv club_ams
Shreeve dv club_ams
 
Schulz sv q2_2009
Schulz sv q2_2009Schulz sv q2_2009
Schulz sv q2_2009
 
Schulz dallas q1_2008
Schulz dallas q1_2008Schulz dallas q1_2008
Schulz dallas q1_2008
 
Salamian dv club_foils_intel_austin
Salamian dv club_foils_intel_austinSalamian dv club_foils_intel_austin
Salamian dv club_foils_intel_austin
 
Sakar jain
Sakar jainSakar jain
Sakar jain
 
Runner sv q307
Runner sv q307Runner sv q307
Runner sv q307
 
Roy aerofone power_verif
Roy aerofone power_verifRoy aerofone power_verif
Roy aerofone power_verif
 
Robert page-abstract
Robert page-abstractRobert page-abstract
Robert page-abstract
 
Rash
RashRash
Rash
 
Pedneau dv employment_081705
Pedneau dv employment_081705Pedneau dv employment_081705
Pedneau dv employment_081705
 
Oop dv club_rev2b-4
Oop dv club_rev2b-4Oop dv club_rev2b-4
Oop dv club_rev2b-4
 

Boston 2009 q1_kappler_chris

  • 1. Four Years of Quantum Simulation Insights and Lessons Learned Verifying the QoS Engine of an Innovative Network Processor 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc.
  • 2. About the Author • Co-Founder: EDA Software (2006-present) – DV Notebook Series, Achilles Test Systems • Technical Leader: Arch, Micro-Arch, & Verification – Packet Scheduler of Quantum Flow Processor, Cisco • Principal HW Engineer: RTL – Packet Scheduler on forwarding ASIC, Hammerhead • Principal HW Engineer: Architectural modeling & RTL – Bandwidth Aware Memory Controller, C-port • Senior SW Engineer: EDA Software – RTL Compilation, ASIC Emulator, Meta Systems • Senior HW Engineer: RTL & Embedded Software – Ethernet and ATM Switching, UB Networks 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 2
  • 3. Lessons Apply to Many SoCs • Features, complexity, ship date – Innovative enough to represent some risk – Leap-frog current (next) generation – Tested at many layers of abstraction: unit, chip, architectural – Diverse IP blocks • Typical SoC team demographics – Multi-site, Multi-discipline, Multi-group • Challenges – How to leverage corporate resources – How to track, prioritize, and close issues – How to maintain momentum and focus 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 3
  • 4. Take-home Message • Use more CPUs, not more bodies – 100’s of CPU years of simulation – 1000’s of semi-directed test cases – 1M’s of test executions, stored results – Methodical closure of issues and exceptions • Requires investment in environment – Increase productivity by 2 orders of magnitude – Find ways to keep the work load manageable – This talk presents a few ideas… 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 4
  • 5. Related DV Club Presentations • Kersten Eder: University of Bristol – Genetic Programming in Automated Test Code Generation for a Multi-Treaded Microprocessor – DV Club: Bristol – Q4 2008 • Wilson Snyder: SiCortex – Test E.R. Our hope to triage a million tests a day – DV Club: Boston – Q4 2007 • Don Steiss: Cisco – Layered Self-checking Test Generation – DV Club: Dallas - Q2 2007 • Shahram Salamian: Intel – CPU Verification Metrics – DV Club: Austin – Q2 2006 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 5
  • 6. Hardware Testing Schedules Arch. Testing RTL Testing QA Lab Testing Theory: Reality: $$$ SoC $ FPGA Arch. Arch. Testing Testing RTL Testing RTL Testing Emulation QA Lab QA Lab Testing Testing 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 6
  • 7. Planning for First-Pass Silicon • Assume a typical SoC flow – New architecture, next-gen features – Mitigate risk, build confidence Arch. SoC Testing • Leverage corporate resources RTL Testing – How many CPUs? (5, 50, 500, more) – Assume 1-2 hours per test QA Lab Testing – How many test runs prior to tape-out? • 10 CPUs, 120 runs/day, 44K runs per year • 200 CPUs, 2400 runs/day, 876K runs per year • Constraints / Obstacles – Small team, often new to problem domain – No pre-existing tests for next-gen features – Millions of cycles needed per test 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 7
  • 8. Getting Help on Test Creation • End-user test descriptions – e.g. Simplified Command Line Interface (CLI) – Architects, marketing, FAEs, SW, and QA contribute to testing – Jump starts testing with 50-100 important configurations • Have S/W team review testing configurations – Architectural simulation pre-dates register definition – Create a Config API, higher level than registers ASIC Testing OR • Lessons learned on past projects… Platform Software – Simple CLI is not good for all tasks • Hard to create certain low-level cases Config API • Missing some chassis-level details reg-R/W C – Rigid implementation of test language • Lex and Yacc require expert S/W design RTL OR Ref Model ASIC 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 8
  • 9. Leveraging a High Speed Model • Emulation: Ideal – Fast and cycle accurate, but expensive • Otherwise: Fast simulation Performance – C model for requirements testing > 1MHz – C & RTL cosim for comparison < 1KHz – Discrete-event when possible T=f(pps) – Cycle-accurate when necessary T=f(Hz) • Look for transitive correctness – IF (C satisfies requirements) C – AND (C==RTL) – THEN RTL satisfies requirements RTL 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 9
  • 10. Create a Balanced Testing Pipeline Automatic CPU Farm Results Test Intelligent Scoring & Generation Launch Browsing 1 2 3 ~100 CPUs : ~1000 Tests a day (~ 85% existing,~ 15% new) Detect, Debug, Diagn ose, Report, Repair, Rerun Assume 6-8 tests Keep team busy debugged per day without drowning them 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 10
  • 11. Need a Game-Changing Technology • Consider using an SQL database – Store test results in DB • Track failures, history, fixes, complement bug DB – Store test lists in DB • Tune run lists to maximize value – Store actual tests in DB • Treat a test as a data structure • But… – Hand-coded C↔SQL interface can be rigid • Impedes the creation of diverse test types and result formats – Need a light-weight way to upload any test description 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 1 2 3 11
  • 12. Automated Test Generation • Plan the test population – A certain number of constrained-random tests – e.g. 1,000 - 10,000 directed configs 10 - 100 input scenarios per config • Yields 10K-100K unique tests • Upload directed tests to DB – Small programs perform parameter sweeps – Heuristic coverage mutations of bin-neighbors • Lesson learned by analyzing generated tests… – Found some unintended common factors – Snow-ball effects: too much in-breeding – Try to limit the influence of any one test – Only possible when functional coverage is persistent 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 1 2 3 12
  • 13. Test Generation w/ Feedback • Testing for numeric error vs. pass/fail – Performance (QoS) divergence from a theoretical model – Also useful in Analog/Mixed-Signal electrical compliance • Optimization: Particle Swarms or Genetic Algorithms – Apply stimulus – Measure error in test result – Create modified tests to amplify result – Only possible if both test and result are data • But… – Optimizations only amplify problems – They do not provide coverage closure – Most effective when used with coverage sweeps 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 1 2 3 13
  • 14. Regression Launch Functions • Linux farm management – Simple user-managed queuing – Lighten the load on corporate job queuing DB CPU Farm – Custom queuing policies: e.g. time-of-day • Rerunning <test, seed> pairs – Every failing <test, seed> pair is a mini bug report – Aggressive re-run of failing <test, seed> pairs – Automatically isolate the check-in that causes a new failure • n-ary search using the same test and seed If only I had • Interesting challenges run a regression! – Fairness and perception in a linux-farm – C-only simulations vs. licensed apps 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 1 2 3 14
  • 15. Coverage Tables and Test Lists • Functional coverage tables – Capture coverage assertions from all model types – Assertions in C & RTL, special regs in FPGAs • When results, coverage, & tests list are data… – compute expected time per test list – compute expected coverage per test list – optimize out tests that don’t add coverage – swap out one test for a newer tests w/ same coverage – select groups tests to run for any reason – rotate among test lists w/ similar coverage • But… – Creating test lists should not require SQL knowledge – Coverage tables grow large, performance can suffer 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 1 2 3 15
  • 16. Checking and Result Management • To monitor output from 1000 test a day... – Create automated results checkers – Flag feature and performance issues (end of simulation) – Flag C/RTL mismatch or assertion failure (within 10 cycles) • But… – Limitations of a checker can impose limits on test stimulus – Test harness can become too narrowly focused • Lessons learned – Need to create diverse drivers and checkers – Not every stimulus should pass every checker – Emphasis on flexibility, agility 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 1 2 3 16
  • 17. Leverage (the Right) Reusable Pieces • Open source web packages – Sort results by error type and severity – Drill down: result, history, coverage, code version – Small status pages for each regression – Display useful pre-made queries/graphs via web site • Create web/DB mirrors for each model type – Models: C, Cosim, and RTL-only – Results and coverage per model type – Higher level management views for group status • Lessons learned… – Hand coded CGI can be laborious – Other teams could not directly re-use our work – Should not require users to know SQL 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 1 2 3 17
  • 18. Lessons Learned • Drive to SoC tape-out with high confidence – Methodical closure of feature issues and exceptions – Exceed 10K tests & 100K captured results per engineer – Over 1M CPU hours of simulation (125 CPU years) – Keep team fed with work and information – Small team, Very high productivity • Would definitely do it again, but… – Make it easy enough to be useful on smaller projects • 1-5 engineers, 5-10 sims per day – Leverage recent Web 2.0 advances – Canned database-backed platforms – Wikis – Social collaboration and tagging 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc. 18
  • 19. Thank You Questions? chris.kappler@achillestest.com 2/3/2009 Copyright © 2009, Achilles Test Systems, Inc.