SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
STAC Summit 2012 – New York

Low Effort, High Optimization:

STAC Benchmarks & Free Improvements,
Courtesy of Moore’s Law


                         Daryan Dehghanpisheh
                         June 18 - 2012




1   Intel Confidential
Join The Discussion




                                                                                   Join in the discussion!

All trademarks and brands are registered property of their respective companies.                             2
How Intel Uses the STAC Benchmarks
 • Effective in demonstrating relevant platform and
   performance advantages of new Intel products.


 • A trusted rallying point for ecosystem collaboration


 • A credible feedback mechanism for development


 • 1H/12 Reports :
   − STAC-M3 (tick databases)
   − STAC-T (tick-to-trade)
   − STAC-A2 (options risk) – Earlier today



                     INTEL CONFIDENTIAL – NDA REQUIRED
STAC-M3 (tick databases)
• STAC-M3 projects are fostering
  greater industry collaboration


• Major gains YTD, with more
  expected before EOY.


• STAC Report just released:
  − kdb+ with Intel SSD 320 Series
    in ION storage appliance



• Next Phase Tests:
  − Sandy Bridge 4-socket
  − AVX enabling




                                     INTEL CONFIDENTIAL – NDA REQUIRED
ION SR-71 with Intel SSD 320 Series:
High performance per price, power, and space.




                    2.2x to 2.7x better than best
                    spinning disk benchmarks --
                  and in 8x to 24x LESS rack space




              INTEL CONFIDENTIAL – NDA REQUIRED
STAC-T: A New Engagement
• STAC-T is a test methodology and software
• Workload is tick-to-trade
   − Inbound: Market data and execution reports
   − Outbound: Trade messages

• Can benchmark any kind of trading system
• Flexible toolset for designing useful benchmarks
• No standard benchmark specifications yet
• Harness can integrate with hardware-based
  monitoring and playback for highest accuracy


                     INTEL CONFIDENTIAL – NDA REQUIRED
Intel® Xeon® Processor E5-2600 Product Family

Xeon Platforms tested with STAC-T
Software from Redline Trading Systems
InRush Ticker Plant for CME, version 2.12.7
Execution Gateway for CME, version 1.3.0
“Nearly no-op” algorithm


Server Configuration for Redline Solution
HP DL380p Gen8 (Sandy Bridge) and DL360 (Westmere)
Mellanox ConnectX-3 HCA (10GbE)
RHEL 6.1


Server Platform Comparison: Then & Now
Sandybridge: 16 Cores at 2.9 Ghz
Westmere Platform: 12 cores at 2.93 GHz
Block Level View of STAC-T Harness & Testing
                                            STAC Orchestration Scripts

                                                     STAC-T
                                                    Controller



                                                       SUT
                                             Redline Trading InRush +                   FIX
Exchange      Raw exchange
                                          Execution Gateway + “dumb” algo              Orders,
                messages
 market       (market data)                                                   Switch   cancels    Exchange
                                                                                                     Wire
 tcpreplay                    Switch                                                                  STAC-T
                                                                                                      Wire
  data                                        Trading application
                                        HP DL380p + “Sandy Bridge” 2.9 GHz    (same)             transaction
                                                                                                   Responder
               CME FAST
                                                                                                    Responders
                                                                                                    Responder
   feed                                               - OR -                           ACKs,       gateway
                                                                                                        (x4)
                                                                                        fills,
                                         HP DL360 + “Westmere” 2.9 GHz                 rejects

                                       RHEL 6.1, VMA, Mellanox CX3 10GbE
 Recorded
   CME                                                                                              Response
market data                                                                                           rules
                                       Datacom Systems FTP-9504 Optical Tap


                                                        t
                                          tM                             tT
                                          Corvil CNE 5400 + CorvilNet 7.1
                                             (Capture and correlation)



                                         STAC Analysis and Reporting Tools
The Result: GHz for GHz, 23% Faster




    Median latency at 8x market rate:
         Xeon E5-2600: 6.2 μsec
         Xeon 5600:    8.1 μsec




                  INTEL CONFIDENTIAL – NDA REQUIRED
The Heart of a Next-Generation Data Center


     Leading Performance
     Up to 80% performance boost
     over Intel® Xeon® processor
     5600 series-based servers1

                                                                                                                                                        Best combination
                                                                                                                                                        of performance,
                                                                                                                                                        power efficiency,
                                                                                                                                                        and cost
     Flexible & Efficient
     Advanced features automate
     power consumption across the
     platform




     Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
     measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
     information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

     For more information go to intel.com/performance”
       1    Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. Configuration details in backup


10                                                                                INTEL CONFIDENTIAL – NDA REQUIRED
Reduce Bottlenecks With Intel® Integrated I/O


           Would you put a                                                                 Intel® Integrated I/O
           racecar engine in this…
                                                                                                              Xeon E5 2600

                                                                                                             CORE 1           CORE 2


                                                                                                             CORE 3           CORE 4


                                                                                                             CORE 5           CORE 6


                                                                                                             CORE 7           CORE 8

           …or this?                                                                                                  CACHE


                                                                                              Integrated
                                                                                              PCI Express*
                                                                                              3.0




     * Other names and brands may be claimed as the property of others

11                                                                       INTEL CONFIDENTIAL – NDA REQUIRED
New Intel® Integrated I/O

                                                                                                                              Intel® Integrated I/O

             1st server processor
             with Intel® Integrated I/O

             Reduces I/O latency
             by as much as 30%1

             Improves IO bandwidth
             by as much as 2x with
             PCI Express* 3.0 support2




       Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
       measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
       information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

•    1 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel®
     Xeon® processor 5500 series (340 ns). See notes in backup for configuration details

12
 •    2 Source: 8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double INTEL CONFIDENTIAL – NDA REQUIRED
                                                                                        the interconnect bandwidth over the PCIe* 2.0 specification
      (www.pcisig.com/news_room/November_18_2010_Press_Release/ ).
New Intel® Data Direct I/O Technology
                                        (Intel® DDIO)

                                                                                                                                                      Can more than Double
                                                                                                                                                        I/O Performance1
        Send I/O directly to and from
        processor cache for all I/O
        traffic types                                                                                                                                                                  Xeon
                                                                                                                                                                                       2600
                                                                                                                                                                                      Family
        Can allow system memory to
        remain in low power state
                                                                                                                                                             Xeon
        Reduce latency by eliminating                                                                                                                        5600
                                                                                                                                                             Series
        unneeded trips to memory
                                                                                                                                                      [ Transactions per second ]




•    1 Up to 2.3x I/O performance is 1S with a Xeon processor 5600 series vs. 1S Xeon Processor E5-2600 data for L2 forwarding test using 8x10GbE ports .See notes in backup for configuration details


     Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
     measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
     information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.




13                                                                                              INTEL CONFIDENTIAL – NDA REQUIRED
The Heart of a Next-Generation
                                                Data Center
                                                                                                                                          Up to 80% performance
                                                                                                                                          boost vs. prior gen1

                                                                                                                                          Dramatically reduce
                                                                                                                                          compute time with Intel®
                                                                                                                                          Advanced Vector Extensions
                                                                                          Up to 4 channels
                                                                                          DDR3 1600 Mhz
                                                                                          memory                                          Performance when you
                                                                                                                                          need it with Intel® Turbo
                                                                                               Up to 8 cores
                                                                                                                                          Boost Technology 2.0
                                                                                               Up to 20 MB cache
              Integrated
              PCI Express*
              3.0
                                                                                                                                          Intel® Integrated I/O with
              Up to 40                                                                                                                    Intel® Data Direct I/O
              lanes
              per socket                                                                                                                  cuts latency2 while
                                                                                                                                          adding capacity & bandwidth




     Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
     measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
     information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

     For more information go to intel.com/performance
     1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012.
     2 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs.
     Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details
14                                                                                  INTEL CONFIDENTIAL – NDA REQUIRED
     * Other names and brands may be claimed as the property of others
How to achieve the best performance
Maximizing Performance involves “System Level” optimizations


• OEM BIOS Settings: SMI, HyperThreading, C-States- All Off
    − Experiment with EIST & Turbo On/Off


• On the application: Maximize your resources by…
      1.   Pin Threads, Interrupts, and Processes to individual cores using CPU_ID
      2.   Place “communication” functions threads on adjacent cores
      3.   Use PCM to determine L3 Cache Misses & Keep data in L3 Cache
               http://software.intel.com/file/41604
      4.   Compile w/Performance Settings, Use PGO, Evaluate IPP / SSE 4.2 Strings
               http://software.intel.com/en-us/articles/using-avx-without-writing-avx-code/



•    Determine how many cores your trading strategy requires
      1.   Can it run on 8 cores? If so, match up CPU+NIC per strategy
               https://access.redhat.com/knowledge/solutions/53031




                               INTEL CONFIDENTIAL – NDA REQUIRED
Announcing on June 18th 2012




                          Intel® Xeon® Phi™ Product Family
16   INTEL CONFIDENTIAL
Intel® Xeon® Phi™ Product Overview
     • Product family based on Intel® Many Integrated Core Architecture (Intel® MIC Architecture)
       •    Smaller, low power, IA cores with 512-bit wide vector engine

     • First product is a coprocessor codenamed Knights Corner
       •    PCIe card requiring an IA host processor;
            Next gen Knights Landing is processor and coprocessor (card)
       •    Availability: In production in 2012 (Public), product availability in Q4’12 (NDA)
       •    Sold as card via OEMs, Expect to price relative to similarly capable competitive alternatives

     • >1 Teraflop sustained DGEMM (double precision) perf on early silicon at Nov’11

     • Standards-based C/C++/FORTRAN with full support by Intel® Cluster Studio XE

     • Use in offload or cluster model as autonomous compute engine
       •    “More than an accelerator”

     • Supports Data Parallel, Thread Parallel, Process Parallel

     Performance and Programmability for Highly Parallel Applications
17   INTEL CONFIDENTIAL
Next steps
                     • STAC-T1
                      − Extract latency
                      − Test higher-throughput scenarios (e.g., TVITCH)



                     • STAC-A2
                      − Increase use of vectorized code
                      − Test on Xeon® E5 series in 4-socket configuration
                      − Test on Xeon® Phi™ (~50 Cores)



                     • STAC-M3
                      − Test with Xeon® E5 series in 4-socket configuration
                      − More Storage Alternatives


INTEL CONFIDENTIAL
Thank You!
                            STAC
                          Redline
                             HP
                            Corvil
                      Intel MKL Team
                         Intel Labs




INTEL CONFIDENTIAL

Más contenido relacionado

Destacado

Destacado (6)

서버 성능에 대한 정의와 이해
서버 성능에 대한 정의와 이해서버 성능에 대한 정의와 이해
서버 성능에 대한 정의와 이해
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platformSpy hard, challenges of 100G deep packet inspection on x86 platform
Spy hard, challenges of 100G deep packet inspection on x86 platform
 
FPGA CEP Appliance
FPGA CEP ApplianceFPGA CEP Appliance
FPGA CEP Appliance
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 

Último

MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
Cocity Enterprises
 
abortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadh
abortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadhabortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadh
abortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadh
samsungultra782445
 
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammamAbortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
samsungultra782445
 

Último (20)

Seeman_Fiintouch_LLP_Newsletter_May-2024.pdf
Seeman_Fiintouch_LLP_Newsletter_May-2024.pdfSeeman_Fiintouch_LLP_Newsletter_May-2024.pdf
Seeman_Fiintouch_LLP_Newsletter_May-2024.pdf
 
Webinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech BelgiumWebinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech Belgium
 
Shrambal_Distributors_Newsletter_May-2024.pdf
Shrambal_Distributors_Newsletter_May-2024.pdfShrambal_Distributors_Newsletter_May-2024.pdf
Shrambal_Distributors_Newsletter_May-2024.pdf
 
Q1 2024 Conference Call Presentation vF.pdf
Q1 2024 Conference Call Presentation vF.pdfQ1 2024 Conference Call Presentation vF.pdf
Q1 2024 Conference Call Presentation vF.pdf
 
cost-volume-profit analysis.ppt(managerial accounting).pptx
cost-volume-profit analysis.ppt(managerial accounting).pptxcost-volume-profit analysis.ppt(managerial accounting).pptx
cost-volume-profit analysis.ppt(managerial accounting).pptx
 
7 tips trading Deriv Accumulator Options
7 tips trading Deriv Accumulator Options7 tips trading Deriv Accumulator Options
7 tips trading Deriv Accumulator Options
 
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
 
Collecting banker, Capacity of collecting Banker, conditions under section 13...
Collecting banker, Capacity of collecting Banker, conditions under section 13...Collecting banker, Capacity of collecting Banker, conditions under section 13...
Collecting banker, Capacity of collecting Banker, conditions under section 13...
 
Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...
Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...
Famous Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist i...
 
falcon-invoice-discounting-unlocking-prime-investment-opportunities
falcon-invoice-discounting-unlocking-prime-investment-opportunitiesfalcon-invoice-discounting-unlocking-prime-investment-opportunities
falcon-invoice-discounting-unlocking-prime-investment-opportunities
 
20240419-SMC-submission-Annual-Superannuation-Performance-Test-–-design-optio...
20240419-SMC-submission-Annual-Superannuation-Performance-Test-–-design-optio...20240419-SMC-submission-Annual-Superannuation-Performance-Test-–-design-optio...
20240419-SMC-submission-Annual-Superannuation-Performance-Test-–-design-optio...
 
Famous No1 Amil Baba Love marriage Astrologer Specialist Expert In Pakistan a...
Famous No1 Amil Baba Love marriage Astrologer Specialist Expert In Pakistan a...Famous No1 Amil Baba Love marriage Astrologer Specialist Expert In Pakistan a...
Famous No1 Amil Baba Love marriage Astrologer Specialist Expert In Pakistan a...
 
Explore Dual Citizenship in Africa | Citizenship Benefits & Requirements
Explore Dual Citizenship in Africa | Citizenship Benefits & RequirementsExplore Dual Citizenship in Africa | Citizenship Benefits & Requirements
Explore Dual Citizenship in Africa | Citizenship Benefits & Requirements
 
abortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadh
abortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadhabortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadh
abortion pills in Jeddah Saudi Arabia (+919707899604)cytotec pills in Riyadh
 
Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...
Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...
Famous Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi ...
 
FE Credit and SMBC Acquisition Case Studies
FE Credit and SMBC Acquisition Case StudiesFE Credit and SMBC Acquisition Case Studies
FE Credit and SMBC Acquisition Case Studies
 
Avoidable Errors in Payroll Compliance for Payroll Services Providers - Globu...
Avoidable Errors in Payroll Compliance for Payroll Services Providers - Globu...Avoidable Errors in Payroll Compliance for Payroll Services Providers - Globu...
Avoidable Errors in Payroll Compliance for Payroll Services Providers - Globu...
 
Test bank for advanced assessment interpreting findings and formulating diffe...
Test bank for advanced assessment interpreting findings and formulating diffe...Test bank for advanced assessment interpreting findings and formulating diffe...
Test bank for advanced assessment interpreting findings and formulating diffe...
 
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
 
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammamAbortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
Abortion pills in Saudi Arabia (+919707899604)cytotec pills in dammam
 

Low Effort, High Optimization: STAC Benchmarks & Free Improvements, Courtesy of Moore’s Law

  • 1. STAC Summit 2012 – New York Low Effort, High Optimization: STAC Benchmarks & Free Improvements, Courtesy of Moore’s Law Daryan Dehghanpisheh June 18 - 2012 1 Intel Confidential
  • 2. Join The Discussion Join in the discussion! All trademarks and brands are registered property of their respective companies. 2
  • 3. How Intel Uses the STAC Benchmarks • Effective in demonstrating relevant platform and performance advantages of new Intel products. • A trusted rallying point for ecosystem collaboration • A credible feedback mechanism for development • 1H/12 Reports : − STAC-M3 (tick databases) − STAC-T (tick-to-trade) − STAC-A2 (options risk) – Earlier today INTEL CONFIDENTIAL – NDA REQUIRED
  • 4. STAC-M3 (tick databases) • STAC-M3 projects are fostering greater industry collaboration • Major gains YTD, with more expected before EOY. • STAC Report just released: − kdb+ with Intel SSD 320 Series in ION storage appliance • Next Phase Tests: − Sandy Bridge 4-socket − AVX enabling INTEL CONFIDENTIAL – NDA REQUIRED
  • 5. ION SR-71 with Intel SSD 320 Series: High performance per price, power, and space. 2.2x to 2.7x better than best spinning disk benchmarks -- and in 8x to 24x LESS rack space INTEL CONFIDENTIAL – NDA REQUIRED
  • 6. STAC-T: A New Engagement • STAC-T is a test methodology and software • Workload is tick-to-trade − Inbound: Market data and execution reports − Outbound: Trade messages • Can benchmark any kind of trading system • Flexible toolset for designing useful benchmarks • No standard benchmark specifications yet • Harness can integrate with hardware-based monitoring and playback for highest accuracy INTEL CONFIDENTIAL – NDA REQUIRED
  • 7. Intel® Xeon® Processor E5-2600 Product Family Xeon Platforms tested with STAC-T Software from Redline Trading Systems InRush Ticker Plant for CME, version 2.12.7 Execution Gateway for CME, version 1.3.0 “Nearly no-op” algorithm Server Configuration for Redline Solution HP DL380p Gen8 (Sandy Bridge) and DL360 (Westmere) Mellanox ConnectX-3 HCA (10GbE) RHEL 6.1 Server Platform Comparison: Then & Now Sandybridge: 16 Cores at 2.9 Ghz Westmere Platform: 12 cores at 2.93 GHz
  • 8. Block Level View of STAC-T Harness & Testing STAC Orchestration Scripts STAC-T Controller SUT Redline Trading InRush + FIX Exchange Raw exchange Execution Gateway + “dumb” algo Orders, messages market (market data) Switch cancels Exchange Wire tcpreplay Switch STAC-T Wire data Trading application HP DL380p + “Sandy Bridge” 2.9 GHz (same) transaction Responder CME FAST Responders Responder feed - OR - ACKs, gateway (x4) fills, HP DL360 + “Westmere” 2.9 GHz rejects RHEL 6.1, VMA, Mellanox CX3 10GbE Recorded CME Response market data rules Datacom Systems FTP-9504 Optical Tap t tM tT Corvil CNE 5400 + CorvilNet 7.1 (Capture and correlation) STAC Analysis and Reporting Tools
  • 9. The Result: GHz for GHz, 23% Faster Median latency at 8x market rate: Xeon E5-2600: 6.2 μsec Xeon 5600: 8.1 μsec INTEL CONFIDENTIAL – NDA REQUIRED
  • 10. The Heart of a Next-Generation Data Center Leading Performance Up to 80% performance boost over Intel® Xeon® processor 5600 series-based servers1 Best combination of performance, power efficiency, and cost Flexible & Efficient Advanced features automate power consumption across the platform Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to intel.com/performance” 1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. Configuration details in backup 10 INTEL CONFIDENTIAL – NDA REQUIRED
  • 11. Reduce Bottlenecks With Intel® Integrated I/O Would you put a Intel® Integrated I/O racecar engine in this… Xeon E5 2600 CORE 1 CORE 2 CORE 3 CORE 4 CORE 5 CORE 6 CORE 7 CORE 8 …or this? CACHE Integrated PCI Express* 3.0 * Other names and brands may be claimed as the property of others 11 INTEL CONFIDENTIAL – NDA REQUIRED
  • 12. New Intel® Integrated I/O Intel® Integrated I/O 1st server processor with Intel® Integrated I/O Reduces I/O latency by as much as 30%1 Improves IO bandwidth by as much as 2x with PCI Express* 3.0 support2 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. • 1 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details 12 • 2 Source: 8 GT/s and 128b/130b encoding in PCIe* 3.0 specification enables double INTEL CONFIDENTIAL – NDA REQUIRED the interconnect bandwidth over the PCIe* 2.0 specification (www.pcisig.com/news_room/November_18_2010_Press_Release/ ).
  • 13. New Intel® Data Direct I/O Technology (Intel® DDIO) Can more than Double I/O Performance1 Send I/O directly to and from processor cache for all I/O traffic types Xeon 2600 Family Can allow system memory to remain in low power state Xeon Reduce latency by eliminating 5600 Series unneeded trips to memory [ Transactions per second ] • 1 Up to 2.3x I/O performance is 1S with a Xeon processor 5600 series vs. 1S Xeon Processor E5-2600 data for L2 forwarding test using 8x10GbE ports .See notes in backup for configuration details Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 13 INTEL CONFIDENTIAL – NDA REQUIRED
  • 14. The Heart of a Next-Generation Data Center Up to 80% performance boost vs. prior gen1 Dramatically reduce compute time with Intel® Advanced Vector Extensions Up to 4 channels DDR3 1600 Mhz memory Performance when you need it with Intel® Turbo Up to 8 cores Boost Technology 2.0 Up to 20 MB cache Integrated PCI Express* 3.0 Intel® Integrated I/O with Up to 40 Intel® Data Direct I/O lanes per socket cuts latency2 while adding capacity & bandwidth Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to intel.com/performance 1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. 2 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details 14 INTEL CONFIDENTIAL – NDA REQUIRED * Other names and brands may be claimed as the property of others
  • 15. How to achieve the best performance Maximizing Performance involves “System Level” optimizations • OEM BIOS Settings: SMI, HyperThreading, C-States- All Off − Experiment with EIST & Turbo On/Off • On the application: Maximize your resources by… 1. Pin Threads, Interrupts, and Processes to individual cores using CPU_ID 2. Place “communication” functions threads on adjacent cores 3. Use PCM to determine L3 Cache Misses & Keep data in L3 Cache  http://software.intel.com/file/41604 4. Compile w/Performance Settings, Use PGO, Evaluate IPP / SSE 4.2 Strings  http://software.intel.com/en-us/articles/using-avx-without-writing-avx-code/ • Determine how many cores your trading strategy requires 1. Can it run on 8 cores? If so, match up CPU+NIC per strategy  https://access.redhat.com/knowledge/solutions/53031 INTEL CONFIDENTIAL – NDA REQUIRED
  • 16. Announcing on June 18th 2012 Intel® Xeon® Phi™ Product Family 16 INTEL CONFIDENTIAL
  • 17. Intel® Xeon® Phi™ Product Overview • Product family based on Intel® Many Integrated Core Architecture (Intel® MIC Architecture) • Smaller, low power, IA cores with 512-bit wide vector engine • First product is a coprocessor codenamed Knights Corner • PCIe card requiring an IA host processor; Next gen Knights Landing is processor and coprocessor (card) • Availability: In production in 2012 (Public), product availability in Q4’12 (NDA) • Sold as card via OEMs, Expect to price relative to similarly capable competitive alternatives • >1 Teraflop sustained DGEMM (double precision) perf on early silicon at Nov’11 • Standards-based C/C++/FORTRAN with full support by Intel® Cluster Studio XE • Use in offload or cluster model as autonomous compute engine • “More than an accelerator” • Supports Data Parallel, Thread Parallel, Process Parallel Performance and Programmability for Highly Parallel Applications 17 INTEL CONFIDENTIAL
  • 18. Next steps • STAC-T1 − Extract latency − Test higher-throughput scenarios (e.g., TVITCH) • STAC-A2 − Increase use of vectorized code − Test on Xeon® E5 series in 4-socket configuration − Test on Xeon® Phi™ (~50 Cores) • STAC-M3 − Test with Xeon® E5 series in 4-socket configuration − More Storage Alternatives INTEL CONFIDENTIAL
  • 19. Thank You! STAC Redline HP Corvil Intel MKL Team Intel Labs INTEL CONFIDENTIAL