SlideShare una empresa de Scribd logo
1 de 29
Hardware-assisted x86
emulation on Loongson-3
       syuu@openbsd.org
What is Loongson?
•   “A Chinese Challenge to Intel”
•   Microprocessor development project in ICT
•   ST Microelectronics is manufacturing & selling
•   MIPS compatible, but independently developed
History of Loongson
2002 Loongson 1: 200MHz, 180nm, MIPS32
2003 Loongson 2B: 250MHz, 180nm, MIPS64
2004 Loongson 2C: 450MHz, 180nm, MIPS64
2006 Loongson 2E: 1GHz, 90nm, MIPS64, 512KB L2, DDR, 5~7W
     Loongson 2C based small computer released(Lemote Longmeng)
2007 Loongson 2F: 1GHz, 90nm, MIPS64, 512KB L2, DDR2, PCI/PCI-X 3~5W
     Loongson 2F based HPC revealed(KD-50-I, 330 core, 1TFLOPS)
2008 Loongson 2E/2F based Netbook released(Jisus, Gdium, Lemote Yeeloong)
2009 ICT licensed the MIPS32/64 architecture from MIPS Technologies
2010 Loongson 3A: 4core 1GHz, 65nm, 4MB L2, DDR2/3, HyperTransport 1.0, PCI/PCI-
X, 10W
     Loongson 3A based HPC announced(KD-60-I, 4x80 core, 1TFLOPS)
SPEC CPU2000 Rate
      Godson Development

!""""
             Intel/AMD/HP/IBM/SGI/Sparc SPEC cpu2000 rate


 !"""



  !""
                     Godson rate



   !"
     !###   $"""   $""!    $""$    $""%    $""&    $""'     $""(
                                                            5
Yes, it runs OpenBSD!
Also other OSes

• Linux: Debian, RedFlag, Mandriva...
• NetBSD
• Windows CE
GS464(Loongson 3A)

• Scalable Architecture
• Reconfigurable CPU core and L2
• Hardware-assisted x86 emulation
• Low power consumption
Scalable Architecture
      Scalable Architecture Design
! Scalable interconnection networ k
   " C rossbar + M esh
   •  8x8 crossbar
   " Single crossbar connects cores, L2s, and four directions
! Directory-based cache coherence protocol
   • Directory caches cache coherency
  " Distributed L2 based are globally addressed

  "• Bothcore cache65nm(3B), 4 core on 32nm(3C)directory
  " E ach cache block has a directory entry
      2
          data
               on
                    and instruction cache are recorded in


                                              P0    P1   P2 P3



                                       E                            E
                                       S                            S
                                       W           8x8 X bar        W
                                       N                            N




                                              L2    L2   L2    L2
                                                                        11
Reconfigurable CPU
             core and L2
   Reconfigurable architecture



                             Special purpose
 General purpose             Core GStera
 Core GS464




DMA engine can be       8 configurable address
configured to achieve   windows of each master port
high performance        allow pages migration across
                        L2 and memory
Hardware-assisted x86
     emulation
• On software based binary translation,
  some of x86 instruction requires tens of
  MIPS instructions due to the difference of
  ISA
• added 200+ of new instructions to reduce
  instructions on binary translation
BHT: Branch history table     ITLB: Instruction translation



                  Virtual machine
                                  BRQ: Bandwidth request              look-aside buffer
                                  DTLB: Data translation        RAS: Return address stack
                                        look-aside buffer       TAP: Test access port




                    architecture
Figure 1. GS464 microarchitecture. GS464 adopts a nine-stage dynamical pipeline.




      Microsoft Windows       Linux applications on x86
                                                              Linux applications on MIPS
       System-level x86           Process-level x86
       virtual machine             virtual machine

                                     Linux on MIPS

                                  Enhanced MIPS core




       • It’s just QEMU on Linux
Figure 2. The GS464 virtual machine’s software architecture. The x86 operating systems
and applications are built on MIPS Linux system through virtual machine monitor.



       •support for EFlag modified to improve performance,
Hardware
           QEMU of x86                         arithmetic calculation, and the branch direc-
           using new instructions
  A major difference between the x86 and       tions of branch instructions are determined
MIPS ISAs is that the x86 ISA uses EFlags.     according to the EFlag values. MIPS fixed-
x86 EFlag support
• Most of x86 fixed-point arithmetic
  instructions generate EFlag
• Branch directions of branch instructions are
  determined according to the EFlag
• MIPS doesn’t have flag register!
  Therefore it needs to check result and set/
  clear bit on virtual EFlag register on runtime
• That’s very costly
x86 EFlag support:
        Solution

• Add new instructions to handle EFlag
 • Generate EFlag
 • Branch on EFlag
Number of
instructions                                Instruction                         Comment
0                    SUB       ECX                  EDX
1                    JE        X86_target

(a)

0.00                 SUBU      Result               Recx          Redx
0.01                 SRL       Rsf                  Result        31            /*SF=Result[31]*/
0.02                 BEQ       Result               R0            L1
0.03                 ADD       Rzf                  R0            R0            /*ZF=0*/
0.04                 B         L2
0.05                 NOP
0.06           L1:   ADDI      Rzf                  R0            1             /*ZF=1*/
.              .     .         .                    .             .             .
.              .     .         .                    .             .             .
.              .     .         .                    .             .             .
0.35                 B         L8
0.36                 NOP
0.37           L7:   ADDI      Rcf                  R0            1             /*CF=1*/
0.38           L8:   ADD       Recx                 Result        R0
1.00                 BNE       Rzf                  R0            MIPS_target
1.01                 NOP

(b)

0.0                  SUBU      Result               Recx          Redx          /*Generating Sub result*/
0.1                  SETFLAG
0.2                  SUBU      Reflag               Recx          Redx          /*Generating EFLAGS*/
1.0                  X86JE     Reflag               MIPS_target                 /*Branch on EFLAGS*/

(c)

0.0                  SUB       Result               Recx          Redx          /*Generating Sub result*/
0.1                  X86SUB    Reflag               Recx          Redx          /*Generating EFLAGS*/
1.0                  X86JE     Reflag               MIPS_target                 /*Branch on EFLAGS*/

(d)
x87 support
•   Register stack:
    •   Maintaining TOP pointer is costly
    •   Calculating absolute register number from
        relative register number is costly
    •   Emulating x87 tag to detect stack overflow/
        underflow is costly
•   80bit floating point:
    MIPS only has 64bit floating point!
x87 support:
                Solution
•   Calculates TOP value in the decode stage, using register
    renaming
    New flag on fp control register to point TOP
    => Reduces 10+ instructions in each x87 instruction
•   New instruction to simulate x87 tag, and new exception to
    detect stack overflow/underflow
•   New instructions for 80bit floating point:
    •   80 bit fp number using two 64bit reg => 64 bit fp number
        using one 64bit reg
    •   64 bit fp number using one 64bit reg =>
        80 bit fp number using two 64bit reg
Number of
instructions                   Instruction                 Comment
0              FLD        *%R10
1              FMUL       *16(%R10)
2              FSTP       *%R10

(a)

0.00           LD         Rtmp1              12(R8)        /*convert 1st operand*/
0.01           LD         Rtmp2              4(R8)
0.02           ANDI       Rsign              Rtmp1         /*get sign bit and sign bit of
                                                           exp*/
0.03           DSLL32     Rsign              Rsign    16   /*get biased exponent
.              .          .                  .        .    .
.              .          .                  .        .    .
.              .          .                  .        .    .
0.23           DMTC1      F8                 Rfp2
1.00           MUL.d      F9                 F7       F8   /*64-bit multiply*/
2.00           DMFC1      Rres               F9
2.01           DSRL32     Rsign              Rres     31   /*get sign bit*/
.              .          .                  .        .    .
.              .          .                  .        .    .
.              .          .                  .        .    .
2.12           SD         Rres1              12(R8)        /*write back result*/
2.13           SD         Rres2              4(R8)

(b)

0.0            GSLQC1     F4                 4(R8)         /*128-bit load to F4 and F5*/
0.1            CVT.d.ld   F7                 F4       F5   /*80-bit to 64-bit convert*/
0.2            GSLQC1     F2                 20(R8)        /*128-bit load to F2 and F3*/
0.3            CVT.d.ld   F8                 F2       F3   /*80-bit to 64-bit convert*/
1.0            MUL.d      F9                 F7       F8   /*64-bit multiplication*/
2.0            CVT.ud.d   F7                 F9            /*64-bit to high part of 80-
                                                           bit*/
2.1            CVT.ld.d   F8                 F9            /*64-bit to low part of 80-bit*/
2.2            GSSQC1     F7                 4(R8)         /*128-bit store*/
Multimedia instructions

• x86 has MMX, SSE, SSE2...
• MIPS as extention instruction set called
  MDMX, but very different from x86
  multimedia instructions
• Added original SIMD instruction set which
  similar to SSE2
New addressing mode
• MIPS only supports
  “(base) + disp” for fixed/float,
  “(base) + (index)” for float
• x86 has more flexible addressing modes
  ex: “(base) + (index) x scale + disp”
• ‘‘(base) + (index) + disp8’’ addressing mode
  added to translate it
Bounded load and
         store
• x86 has segment address mode
• Bounded load/store instruction added to
  handle this
  This reads bound register as the memory-
  access boundary
• It raises address exception if the memory-
  access exceeds the boundary
Fixed-point multiplication
       and division
• MIPS fixed-point multiplication/division
  instruction use the special Hi/Lo register as
  destination
  Additional operation needed to move data
  from Hi/Lo register to general-purpose
  registers
• Added fixed-point multiplication/divison
  instruction which use general-purpose
  register as destination
Byte insertion and
        extraction
• x86 supports 8, 16, 32, 64bit operations
• MIPS only supports 32, 64bit operations
• Added flexible byte insertion instructions
  that can insert 8, 16, 32bit from any
  location of a register to any location of
  another register
  Also added flexible byte extraction
  instructions
CAM
• Translation of indirect branch is costly,
  because the translator must lookup branch
  target dynamically
• It requires
  <x86 branch target:MIPS branch target>
  hash table to keep mapping information
• 64-entry CAM added to speed up it
• CAM Entry format: PID, Address, Data
.................................................................................................................................................
                                                                                 .




        Number of
        instructions                         Instruction                        Comment
        0                      MOV               %RAX            %R11
        1                      JMPQ              %*R11

         (a)

        0                      MOVE              Rr11            Rrax
        1.0                    CAMPV             Rtmp            Rr11           /* Look up the first level indirect jump
                                                                                address */
        1.1                    CAMPV             Rtgt            Rtmp           /* Look up the final jump address */
        1.2                    JR                Rtgt

         (b)


    Figure 5. Example of indirect branch target translation: The original x86 program (a), and the
    program translated with Godson-3 content-associated memory (CAM) instructions (b). The
    boldface text indicates new instructions for x86 emulation.
Context Switch
           Optimization
•   The binary translator stores translated codes in data cache,
    then the execution requires flushing them from data cache and
    loading them into the instruction cache
    •   Keep coherence by hardware, between data and instruction
        cache, as well as L2
•   Binary translator performs context switch between translator
    and translated codes, it requires to save/restore target
    machines register, which simulated as general purpose registers
•   To reduce the costs, 128bit load and store instructions are
    added
•   This save/restore up to four x86 registers in one time
EMBC         x86 assembly                                 FPGA            x86 SIMD
 crobench    C and x86 assembly                           Xtreme-3/FPGA
PEC 2000     C                                            FPGA
PEC 2000
PEC 2000
             C
             C                       Benchmark results    FPGA
                                                          FPGA



  ich bench-
x86 binary                                100
                                                                                     No hardware support
e using the                               90
                                                                                     Hardware support
 tor; and                                 80
   in which
                  Performance (percent)




                                          70
nto x86 bi-
dware using                               60
 y translator                             50
acceleration                              40
) hardware
                                          30
                                          20
  with the                                10
                                           0
 rformance




                                                                            e
                                                 T

                                                     FT


                                                           C


                                                                            1

                                                                            2


                                                                            T

                                                                           ip

                                                                           er


                                                                             t
                                                                          ar

                                                                        ag
                                                C




                                                                       BC

                                                                       BC


                                                                         O
                                                          G




                                                                        gz


                                                                        rs
                                                     -F




                                                                       9.
                                           -ID




                                                                     BO




tor modes



                                                                     er
                                                                     pa
                                                                     4.
                                                                     M

                                                                     M
                                                 FP




                                                                   17

                                                                 Av
                                          FP




                                                            EE

                                                                 EE




                                                                 16
                                                                  S-




                                                                  7.

 . Godson-
                                                                O




                                                               19
Godson SPEC Ratio     Pentium SPEC Ratio
                                         2E-750 2F-800 3A-800   PIII-800  PIV-1.4
   or software on a                       Mhz    Mhz    Mhz       Mhz       Ghz
and time-consuming.       164.gzip        209    251    324       344      397
standard to facilitate    175.vpr         237    239    391       261      246
rdware/software sub-      176.gcc         282    329    369       241      350
hensive debugging ca-     181.mcf         271    232    421       229      255
 ion and debug mode,      186.crafty      356    362    415       352      386
                          197.parser      202    152    225       231      331
   breakpoint, instruc-
                          252.eon         289    441    526       90.7     125
nts, single-step execu-
                          253.perlbmk     235    321    330       397      547
 on. The IEEE 1149.1
                          254.gap         238    243    229       260      441
  ndard is employed to
                          255.vortex      236    274    297       383      478
   EJTAG. Every pro-
                          256.bzip2       247    241    268       249      314
 TAG TAP controller,
                          300.twolf       313    331    486       269      287
  ected as a chain. A     SPECint2000     256    275    345      260       326
h each processor core     168.wupwise     307    308    325       248      474
                          171.swim        247    273    336       218      244
                          172.mgrid       156    155    184       99.2     320
 Evaluation               173.applu       188    268    200       154      333
                          177.mesa        373    438    400       265      265
he first-silicon sample
                          178.galgel       -     345    583        -        -
ned from fabrication.
                          179.art         349    693   1254       115      109
                          183.equake      250    303    278       190      493
                          187.facerec      -     111    177        -        -
                          188.ammp        277    283    364       174      200
                          189.lucas        -     284    251        -        -
                          191.fma3d        -     108    128        -        -
                          200.sixtrack    131    217    184       137      224
                          301.apsi        172    197    225       190      199
                          SPECfp2000      232    254    289      171       263
Conclusion
•   GS464 added 200+ instructions and number of
    optimization for x86 emulation
•   In the result, binary translation speeds up 2x ~ 3x
    faster than original QEMU
•   That’s neary 70% performance of MIPS native binary
•   CPU performance itself is poor though
•   The paper doesn’t tell us enough informations to know
    actual performance of the emulation on real chip...
•   Anyway Loongson-3 looks good try and interesting!
Papers & Slides
• “GODSON-3: A SCALABLE MULTICORE
  RISC PROCESSOR WITH X86
  EMULATION”
• “Micro-architecture of Godson-3 Multi-Core
  Processor”
• “Efficient Binary Translation System with Low
  Hardware Cost”
• “Godson-3 Multicore RISC Processor”

Más contenido relacionado

La actualidad más candente

Linkmeup v23-compass-eos
Linkmeup v23-compass-eosLinkmeup v23-compass-eos
Linkmeup v23-compass-eoseucariot
 
Lee 2020 what the clock !
Lee 2020  what the clock !Lee 2020  what the clock !
Lee 2020 what the clock !Neil Armstrong
 
IP tables
IP tablesIP tables
IP tablesaamodt
 
NST Product Catalog
NST Product CatalogNST Product Catalog
NST Product Catalogmoonhyo
 
BKK16-303 96Boards - TV Platform
BKK16-303 96Boards - TV PlatformBKK16-303 96Boards - TV Platform
BKK16-303 96Boards - TV PlatformLinaro
 
Technical Proposal - Structured Cabling
Technical Proposal - Structured CablingTechnical Proposal - Structured Cabling
Technical Proposal - Structured Cablingwolfthrone
 
How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1n|u - The Open Security Community
 

La actualidad más candente (9)

Linkmeup v23-compass-eos
Linkmeup v23-compass-eosLinkmeup v23-compass-eos
Linkmeup v23-compass-eos
 
Lee 2020 what the clock !
Lee 2020  what the clock !Lee 2020  what the clock !
Lee 2020 what the clock !
 
IP tables
IP tablesIP tables
IP tables
 
Multicast for ipv6
Multicast for ipv6Multicast for ipv6
Multicast for ipv6
 
Gpu archi
Gpu archiGpu archi
Gpu archi
 
NST Product Catalog
NST Product CatalogNST Product Catalog
NST Product Catalog
 
BKK16-303 96Boards - TV Platform
BKK16-303 96Boards - TV PlatformBKK16-303 96Boards - TV Platform
BKK16-303 96Boards - TV Platform
 
Technical Proposal - Structured Cabling
Technical Proposal - Structured CablingTechnical Proposal - Structured Cabling
Technical Proposal - Structured Cabling
 
How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1How to convert your Linux box into Security Gateway - Part 1
How to convert your Linux box into Security Gateway - Part 1
 

Destacado

Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data typesRowena Cornejo
 
Intel & ARM: Strategic Comparison
Intel & ARM: Strategic ComparisonIntel & ARM: Strategic Comparison
Intel & ARM: Strategic ComparisonToby Allen
 
Made to measure: Trends in personalised digital learning
Made to measure: Trends in personalised digital learning Made to measure: Trends in personalised digital learning
Made to measure: Trends in personalised digital learning Brightwave Group
 
The end of injection education
The end of injection educationThe end of injection education
The end of injection educationBrightwave Group
 
Fprm arthritis factsheet
Fprm   arthritis factsheetFprm   arthritis factsheet
Fprm arthritis factsheetAlan Bassett
 
OSvパンフレット v3
OSvパンフレット v3OSvパンフレット v3
OSvパンフレット v3Takuya ASADA
 
Digital Audio - Technical Writing Class Paper - OCR Reformat
Digital Audio - Technical Writing Class Paper - OCR ReformatDigital Audio - Technical Writing Class Paper - OCR Reformat
Digital Audio - Technical Writing Class Paper - OCR ReformatPaul Teich
 
Total learning: informal learning driving new learning culture at Tesco Bank
Total learning: informal learning driving new learning culture at Tesco BankTotal learning: informal learning driving new learning culture at Tesco Bank
Total learning: informal learning driving new learning culture at Tesco BankBrightwave Group
 
DB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTreeDB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTreeLixun Peng
 
Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...Brightwave Group
 
agencija registracija vozila agencije tehnicki pregled
agencija registracija vozila agencije tehnicki pregledagencija registracija vozila agencije tehnicki pregled
agencija registracija vozila agencije tehnicki pregledregistracija vozila
 
Kp event presentation_steph
Kp event presentation_stephKp event presentation_steph
Kp event presentation_stephBrightwave Group
 

Destacado (20)

X86 operation types
X86 operation typesX86 operation types
X86 operation types
 
Intel x86 and ARM Data types
Intel x86 and ARM Data typesIntel x86 and ARM Data types
Intel x86 and ARM Data types
 
Intel & ARM: Strategic Comparison
Intel & ARM: Strategic ComparisonIntel & ARM: Strategic Comparison
Intel & ARM: Strategic Comparison
 
Made to measure: Trends in personalised digital learning
Made to measure: Trends in personalised digital learning Made to measure: Trends in personalised digital learning
Made to measure: Trends in personalised digital learning
 
Janison Online Classrooms
Janison Online ClassroomsJanison Online Classrooms
Janison Online Classrooms
 
The end of injection education
The end of injection educationThe end of injection education
The end of injection education
 
Open sourcelibrary
Open sourcelibraryOpen sourcelibrary
Open sourcelibrary
 
Fprm arthritis factsheet
Fprm   arthritis factsheetFprm   arthritis factsheet
Fprm arthritis factsheet
 
OSvパンフレット v3
OSvパンフレット v3OSvパンフレット v3
OSvパンフレット v3
 
Digital Audio - Technical Writing Class Paper - OCR Reformat
Digital Audio - Technical Writing Class Paper - OCR ReformatDigital Audio - Technical Writing Class Paper - OCR Reformat
Digital Audio - Technical Writing Class Paper - OCR Reformat
 
Introducción a Youtube
Introducción a YoutubeIntroducción a Youtube
Introducción a Youtube
 
DUID TRANSFORMATION
DUID TRANSFORMATIONDUID TRANSFORMATION
DUID TRANSFORMATION
 
Focus on Simplicity
Focus on Simplicity Focus on Simplicity
Focus on Simplicity
 
Total learning: informal learning driving new learning culture at Tesco Bank
Total learning: informal learning driving new learning culture at Tesco BankTotal learning: informal learning driving new learning culture at Tesco Bank
Total learning: informal learning driving new learning culture at Tesco Bank
 
DB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTreeDB_Algorithm_and_Data_Structure_About_BTree
DB_Algorithm_and_Data_Structure_About_BTree
 
Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...Sharpest tool in the box: Choosing the right authoring tool for your learning...
Sharpest tool in the box: Choosing the right authoring tool for your learning...
 
agencija registracija vozila agencije tehnicki pregled
agencija registracija vozila agencije tehnicki pregledagencija registracija vozila agencije tehnicki pregled
agencija registracija vozila agencije tehnicki pregled
 
Kelly Ruggles
Kelly RugglesKelly Ruggles
Kelly Ruggles
 
Lsg4 dontaylor
Lsg4 dontaylorLsg4 dontaylor
Lsg4 dontaylor
 
Kp event presentation_steph
Kp event presentation_stephKp event presentation_steph
Kp event presentation_steph
 

Similar a Hardware assited x86 emulation on godson 3

Basics Of Embedded Systems
Basics Of Embedded SystemsBasics Of Embedded Systems
Basics Of Embedded Systemsarlabstech
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)elliando dias
 
My seminar new 28
My seminar new 28My seminar new 28
My seminar new 28rajeshkvdn
 
PIC introduction + mapping
PIC introduction + mappingPIC introduction + mapping
PIC introduction + mappingOsaMa Hasan
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfssuser30e7d2
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdfssuser3fb50b
 
Introduction2_PIC.ppt
Introduction2_PIC.pptIntroduction2_PIC.ppt
Introduction2_PIC.pptAakashRawat35
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOSICS
 
LPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.pptLPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.pptProfBadariNathK
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorDeepak Tomar
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linaro
 
5. Features of the LPC214X Family.pptx
5. Features of the LPC214X Family.pptx5. Features of the LPC214X Family.pptx
5. Features of the LPC214X Family.pptxSivakumarG52
 
07 processor basics
07 processor basics07 processor basics
07 processor basicsMurali M
 
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...Anne Nicolas
 

Similar a Hardware assited x86 emulation on godson 3 (20)

Basics Of Embedded Systems
Basics Of Embedded SystemsBasics Of Embedded Systems
Basics Of Embedded Systems
 
Blackfin core architecture
Blackfin core architectureBlackfin core architecture
Blackfin core architecture
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)
 
My seminar new 28
My seminar new 28My seminar new 28
My seminar new 28
 
PIC introduction + mapping
PIC introduction + mappingPIC introduction + mapping
PIC introduction + mapping
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdf
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Introduction2_PIC.ppt
Introduction2_PIC.pptIntroduction2_PIC.ppt
Introduction2_PIC.ppt
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOS
 
LPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.pptLPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.ppt
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft Processor
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
 
5. Features of the LPC214X Family.pptx
5. Features of the LPC214X Family.pptx5. Features of the LPC214X Family.pptx
5. Features of the LPC214X Family.pptx
 
07 processor basics
07 processor basics07 processor basics
07 processor basics
 
Lecture9
Lecture9Lecture9
Lecture9
 
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...
Kernel Recipes 2014 - x86 instruction encoding and the nasty hacks we do in t...
 
Data Sheet PIC 16F84
Data Sheet PIC 16F84Data Sheet PIC 16F84
Data Sheet PIC 16F84
 
PIC 16F84
PIC 16F84PIC 16F84
PIC 16F84
 
80x86_2.pdf
80x86_2.pdf80x86_2.pdf
80x86_2.pdf
 

Más de Takuya ASADA

Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Takuya ASADA
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークTakuya ASADA
 
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」Takuya ASADA
 
ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜Takuya ASADA
 
UEFI時代のブートローダ
UEFI時代のブートローダUEFI時代のブートローダ
UEFI時代のブートローダTakuya ASADA
 
OSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meetingOSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meetingTakuya ASADA
 
OSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/FallOSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/FallTakuya ASADA
 
OSvの概要と実装
OSvの概要と実装OSvの概要と実装
OSvの概要と実装Takuya ASADA
 
Linux network stack
Linux network stackLinux network stack
Linux network stackTakuya ASADA
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理Takuya ASADA
 
Presentation on your terminal
Presentation on your terminalPresentation on your terminal
Presentation on your terminalTakuya ASADA
 
僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがないTakuya ASADA
 
Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt AffinityについてTakuya ASADA
 
OSvパンフレット
OSvパンフレットOSvパンフレット
OSvパンフレットTakuya ASADA
 
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜Takuya ASADA
 
「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2Takuya ASADA
 
「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1Takuya ASADA
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化Takuya ASADA
 
Implements BIOS emulation support for BHyVe: A BSD Hypervisor
Implements BIOS emulation support for BHyVe: A BSD HypervisorImplements BIOS emulation support for BHyVe: A BSD Hypervisor
Implements BIOS emulation support for BHyVe: A BSD HypervisorTakuya ASADA
 

Más de Takuya ASADA (20)

Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
 
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
 
ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜
 
UEFI時代のブートローダ
UEFI時代のブートローダUEFI時代のブートローダ
UEFI時代のブートローダ
 
OSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meetingOSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meeting
 
OSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/FallOSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/Fall
 
OSv噺
OSv噺OSv噺
OSv噺
 
OSvの概要と実装
OSvの概要と実装OSvの概要と実装
OSvの概要と実装
 
Linux network stack
Linux network stackLinux network stack
Linux network stack
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理
 
Presentation on your terminal
Presentation on your terminalPresentation on your terminal
Presentation on your terminal
 
僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない
 
Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt Affinityについて
 
OSvパンフレット
OSvパンフレットOSvパンフレット
OSvパンフレット
 
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
 
「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2
 
「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化
 
Implements BIOS emulation support for BHyVe: A BSD Hypervisor
Implements BIOS emulation support for BHyVe: A BSD HypervisorImplements BIOS emulation support for BHyVe: A BSD Hypervisor
Implements BIOS emulation support for BHyVe: A BSD Hypervisor
 

Hardware assited x86 emulation on godson 3

  • 1. Hardware-assisted x86 emulation on Loongson-3 syuu@openbsd.org
  • 2. What is Loongson? • “A Chinese Challenge to Intel” • Microprocessor development project in ICT • ST Microelectronics is manufacturing & selling • MIPS compatible, but independently developed
  • 3. History of Loongson 2002 Loongson 1: 200MHz, 180nm, MIPS32 2003 Loongson 2B: 250MHz, 180nm, MIPS64 2004 Loongson 2C: 450MHz, 180nm, MIPS64 2006 Loongson 2E: 1GHz, 90nm, MIPS64, 512KB L2, DDR, 5~7W Loongson 2C based small computer released(Lemote Longmeng) 2007 Loongson 2F: 1GHz, 90nm, MIPS64, 512KB L2, DDR2, PCI/PCI-X 3~5W Loongson 2F based HPC revealed(KD-50-I, 330 core, 1TFLOPS) 2008 Loongson 2E/2F based Netbook released(Jisus, Gdium, Lemote Yeeloong) 2009 ICT licensed the MIPS32/64 architecture from MIPS Technologies 2010 Loongson 3A: 4core 1GHz, 65nm, 4MB L2, DDR2/3, HyperTransport 1.0, PCI/PCI- X, 10W Loongson 3A based HPC announced(KD-60-I, 4x80 core, 1TFLOPS)
  • 4. SPEC CPU2000 Rate Godson Development !"""" Intel/AMD/HP/IBM/SGI/Sparc SPEC cpu2000 rate !""" !"" Godson rate !" !### $""" $""! $""$ $""% $""& $""' $""( 5
  • 5. Yes, it runs OpenBSD!
  • 6. Also other OSes • Linux: Debian, RedFlag, Mandriva... • NetBSD • Windows CE
  • 7. GS464(Loongson 3A) • Scalable Architecture • Reconfigurable CPU core and L2 • Hardware-assisted x86 emulation • Low power consumption
  • 8. Scalable Architecture Scalable Architecture Design ! Scalable interconnection networ k " C rossbar + M esh • 8x8 crossbar " Single crossbar connects cores, L2s, and four directions ! Directory-based cache coherence protocol • Directory caches cache coherency " Distributed L2 based are globally addressed "• Bothcore cache65nm(3B), 4 core on 32nm(3C)directory " E ach cache block has a directory entry 2 data on and instruction cache are recorded in P0 P1 P2 P3 E E S S W 8x8 X bar W N N L2 L2 L2 L2 11
  • 9. Reconfigurable CPU core and L2 Reconfigurable architecture Special purpose General purpose Core GStera Core GS464 DMA engine can be 8 configurable address configured to achieve windows of each master port high performance allow pages migration across L2 and memory
  • 10. Hardware-assisted x86 emulation • On software based binary translation, some of x86 instruction requires tens of MIPS instructions due to the difference of ISA • added 200+ of new instructions to reduce instructions on binary translation
  • 11. BHT: Branch history table ITLB: Instruction translation Virtual machine BRQ: Bandwidth request look-aside buffer DTLB: Data translation RAS: Return address stack look-aside buffer TAP: Test access port architecture Figure 1. GS464 microarchitecture. GS464 adopts a nine-stage dynamical pipeline. Microsoft Windows Linux applications on x86 Linux applications on MIPS System-level x86 Process-level x86 virtual machine virtual machine Linux on MIPS Enhanced MIPS core • It’s just QEMU on Linux Figure 2. The GS464 virtual machine’s software architecture. The x86 operating systems and applications are built on MIPS Linux system through virtual machine monitor. •support for EFlag modified to improve performance, Hardware QEMU of x86 arithmetic calculation, and the branch direc- using new instructions A major difference between the x86 and tions of branch instructions are determined MIPS ISAs is that the x86 ISA uses EFlags. according to the EFlag values. MIPS fixed-
  • 12. x86 EFlag support • Most of x86 fixed-point arithmetic instructions generate EFlag • Branch directions of branch instructions are determined according to the EFlag • MIPS doesn’t have flag register! Therefore it needs to check result and set/ clear bit on virtual EFlag register on runtime • That’s very costly
  • 13. x86 EFlag support: Solution • Add new instructions to handle EFlag • Generate EFlag • Branch on EFlag
  • 14. Number of instructions Instruction Comment 0 SUB ECX EDX 1 JE X86_target (a) 0.00 SUBU Result Recx Redx 0.01 SRL Rsf Result 31 /*SF=Result[31]*/ 0.02 BEQ Result R0 L1 0.03 ADD Rzf R0 R0 /*ZF=0*/ 0.04 B L2 0.05 NOP 0.06 L1: ADDI Rzf R0 1 /*ZF=1*/ . . . . . . . . . . . . . . . . . . . . . 0.35 B L8 0.36 NOP 0.37 L7: ADDI Rcf R0 1 /*CF=1*/ 0.38 L8: ADD Recx Result R0 1.00 BNE Rzf R0 MIPS_target 1.01 NOP (b) 0.0 SUBU Result Recx Redx /*Generating Sub result*/ 0.1 SETFLAG 0.2 SUBU Reflag Recx Redx /*Generating EFLAGS*/ 1.0 X86JE Reflag MIPS_target /*Branch on EFLAGS*/ (c) 0.0 SUB Result Recx Redx /*Generating Sub result*/ 0.1 X86SUB Reflag Recx Redx /*Generating EFLAGS*/ 1.0 X86JE Reflag MIPS_target /*Branch on EFLAGS*/ (d)
  • 15. x87 support • Register stack: • Maintaining TOP pointer is costly • Calculating absolute register number from relative register number is costly • Emulating x87 tag to detect stack overflow/ underflow is costly • 80bit floating point: MIPS only has 64bit floating point!
  • 16. x87 support: Solution • Calculates TOP value in the decode stage, using register renaming New flag on fp control register to point TOP => Reduces 10+ instructions in each x87 instruction • New instruction to simulate x87 tag, and new exception to detect stack overflow/underflow • New instructions for 80bit floating point: • 80 bit fp number using two 64bit reg => 64 bit fp number using one 64bit reg • 64 bit fp number using one 64bit reg => 80 bit fp number using two 64bit reg
  • 17. Number of instructions Instruction Comment 0 FLD *%R10 1 FMUL *16(%R10) 2 FSTP *%R10 (a) 0.00 LD Rtmp1 12(R8) /*convert 1st operand*/ 0.01 LD Rtmp2 4(R8) 0.02 ANDI Rsign Rtmp1 /*get sign bit and sign bit of exp*/ 0.03 DSLL32 Rsign Rsign 16 /*get biased exponent . . . . . . . . . . . . . . . . . . 0.23 DMTC1 F8 Rfp2 1.00 MUL.d F9 F7 F8 /*64-bit multiply*/ 2.00 DMFC1 Rres F9 2.01 DSRL32 Rsign Rres 31 /*get sign bit*/ . . . . . . . . . . . . . . . . . . 2.12 SD Rres1 12(R8) /*write back result*/ 2.13 SD Rres2 4(R8) (b) 0.0 GSLQC1 F4 4(R8) /*128-bit load to F4 and F5*/ 0.1 CVT.d.ld F7 F4 F5 /*80-bit to 64-bit convert*/ 0.2 GSLQC1 F2 20(R8) /*128-bit load to F2 and F3*/ 0.3 CVT.d.ld F8 F2 F3 /*80-bit to 64-bit convert*/ 1.0 MUL.d F9 F7 F8 /*64-bit multiplication*/ 2.0 CVT.ud.d F7 F9 /*64-bit to high part of 80- bit*/ 2.1 CVT.ld.d F8 F9 /*64-bit to low part of 80-bit*/ 2.2 GSSQC1 F7 4(R8) /*128-bit store*/
  • 18. Multimedia instructions • x86 has MMX, SSE, SSE2... • MIPS as extention instruction set called MDMX, but very different from x86 multimedia instructions • Added original SIMD instruction set which similar to SSE2
  • 19. New addressing mode • MIPS only supports “(base) + disp” for fixed/float, “(base) + (index)” for float • x86 has more flexible addressing modes ex: “(base) + (index) x scale + disp” • ‘‘(base) + (index) + disp8’’ addressing mode added to translate it
  • 20. Bounded load and store • x86 has segment address mode • Bounded load/store instruction added to handle this This reads bound register as the memory- access boundary • It raises address exception if the memory- access exceeds the boundary
  • 21. Fixed-point multiplication and division • MIPS fixed-point multiplication/division instruction use the special Hi/Lo register as destination Additional operation needed to move data from Hi/Lo register to general-purpose registers • Added fixed-point multiplication/divison instruction which use general-purpose register as destination
  • 22. Byte insertion and extraction • x86 supports 8, 16, 32, 64bit operations • MIPS only supports 32, 64bit operations • Added flexible byte insertion instructions that can insert 8, 16, 32bit from any location of a register to any location of another register Also added flexible byte extraction instructions
  • 23. CAM • Translation of indirect branch is costly, because the translator must lookup branch target dynamically • It requires <x86 branch target:MIPS branch target> hash table to keep mapping information • 64-entry CAM added to speed up it • CAM Entry format: PID, Address, Data
  • 24. ................................................................................................................................................. . Number of instructions Instruction Comment 0 MOV %RAX %R11 1 JMPQ %*R11 (a) 0 MOVE Rr11 Rrax 1.0 CAMPV Rtmp Rr11 /* Look up the first level indirect jump address */ 1.1 CAMPV Rtgt Rtmp /* Look up the final jump address */ 1.2 JR Rtgt (b) Figure 5. Example of indirect branch target translation: The original x86 program (a), and the program translated with Godson-3 content-associated memory (CAM) instructions (b). The boldface text indicates new instructions for x86 emulation.
  • 25. Context Switch Optimization • The binary translator stores translated codes in data cache, then the execution requires flushing them from data cache and loading them into the instruction cache • Keep coherence by hardware, between data and instruction cache, as well as L2 • Binary translator performs context switch between translator and translated codes, it requires to save/restore target machines register, which simulated as general purpose registers • To reduce the costs, 128bit load and store instructions are added • This save/restore up to four x86 registers in one time
  • 26. EMBC x86 assembly FPGA x86 SIMD crobench C and x86 assembly Xtreme-3/FPGA PEC 2000 C FPGA PEC 2000 PEC 2000 C C Benchmark results FPGA FPGA ich bench- x86 binary 100 No hardware support e using the 90 Hardware support tor; and 80 in which Performance (percent) 70 nto x86 bi- dware using 60 y translator 50 acceleration 40 ) hardware 30 20 with the 10 0 rformance e T FT C 1 2 T ip er t ar ag C BC BC O G gz rs -F 9. -ID BO tor modes er pa 4. M M FP 17 Av FP EE EE 16 S- 7. . Godson- O 19
  • 27. Godson SPEC Ratio Pentium SPEC Ratio 2E-750 2F-800 3A-800 PIII-800 PIV-1.4 or software on a Mhz Mhz Mhz Mhz Ghz and time-consuming. 164.gzip 209 251 324 344 397 standard to facilitate 175.vpr 237 239 391 261 246 rdware/software sub- 176.gcc 282 329 369 241 350 hensive debugging ca- 181.mcf 271 232 421 229 255 ion and debug mode, 186.crafty 356 362 415 352 386 197.parser 202 152 225 231 331 breakpoint, instruc- 252.eon 289 441 526 90.7 125 nts, single-step execu- 253.perlbmk 235 321 330 397 547 on. The IEEE 1149.1 254.gap 238 243 229 260 441 ndard is employed to 255.vortex 236 274 297 383 478 EJTAG. Every pro- 256.bzip2 247 241 268 249 314 TAG TAP controller, 300.twolf 313 331 486 269 287 ected as a chain. A SPECint2000 256 275 345 260 326 h each processor core 168.wupwise 307 308 325 248 474 171.swim 247 273 336 218 244 172.mgrid 156 155 184 99.2 320 Evaluation 173.applu 188 268 200 154 333 177.mesa 373 438 400 265 265 he first-silicon sample 178.galgel - 345 583 - - ned from fabrication. 179.art 349 693 1254 115 109 183.equake 250 303 278 190 493 187.facerec - 111 177 - - 188.ammp 277 283 364 174 200 189.lucas - 284 251 - - 191.fma3d - 108 128 - - 200.sixtrack 131 217 184 137 224 301.apsi 172 197 225 190 199 SPECfp2000 232 254 289 171 263
  • 28. Conclusion • GS464 added 200+ instructions and number of optimization for x86 emulation • In the result, binary translation speeds up 2x ~ 3x faster than original QEMU • That’s neary 70% performance of MIPS native binary • CPU performance itself is poor though • The paper doesn’t tell us enough informations to know actual performance of the emulation on real chip... • Anyway Loongson-3 looks good try and interesting!
  • 29. Papers & Slides • “GODSON-3: A SCALABLE MULTICORE RISC PROCESSOR WITH X86 EMULATION” • “Micro-architecture of Godson-3 Multi-Core Processor” • “Efficient Binary Translation System with Low Hardware Cost” • “Godson-3 Multicore RISC Processor”

Notas del editor