SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
AN LLVM BACKEND
FOR GHC
David Terei & Manuel Chakravarty
University of New South Wales

Haskell Symposium 2010
Baltimore, USA, 30th Sep
What is ‘An LLVM Backend’?
• Low Level Virtual Machine

• Backend Compiler framework

• Designed to be used by compiler developers, not

 by software developers

• Open Source

• Heavily sponsored by Apple
Motivation
• Simplify
  • Reduce ongoing work
  • Outsource!


• Performance
  • Improve run-time
Example
collat :: Int -> Word32 -> Int                          Run-time (ms)
collat c 1 = c                                   2000
collat c n | even n =                            1800
             collat (c+1) $ n `div` 2            1600
           | otherwise =                         1400
             collat (c+1) $ 3 * n + 1
                                                 1200
                                                 1000
pmax x n = x `max` (collat 1 n, n)
                                                 800
                                                 600
main = print $ foldl pmax (1,1)
                                                 400
                [2..1000000]
                                                 200
                                                   0
                                                         LLVM   C   NCG
Different, updated run-times compared to paper
Competitors
         C Backend (C)            Native Code Generator (NCG)

• GNU C Dependency               • Huge amount of work
  • Badly supported on           • Very limited portability
    platforms such as Windows
                                 • Does very little
• Use a Mangler on
                                  optimisation work
  assembly code
• Slow compilation speed
  • Takes twice as long as the
   NCG
GHC’s Compilation Pipeline
                                                     C

   Core            STG             Cmm              NCG

                                                    LLVM
Language, Abstract Machine, Machine Configuration



                          Heap, Stack, Registers
Compiling to LLVM
Won’t be covering, see paper for full details:
• Why from Cmm and not from STG/Core
• LLVM, C-- & Cmm languages
• Dealing with LLVM’s SSA form
• LLVM type system


Will be covering:
• Handling the STG Registers
• Handling GHC’s Table-Next-To-Code optimisation
Handling the STG Registers
                               STG Register    X86 Register
Implement either by:           Base            ebx
• In memory                    Heap Pointer    edi
• Pin to hardware registers    Stack Pointer   ebp
                               R1              esi
NCG?
• Register allocator permanently stores STG registers in
  hardware

C Backend?
• Uses GNU C extension (global register variables) to
  also permanently store STG registers in hardware
Handling the STG Registers
LLVM handles by implementing a new calling convention:
             STG Register    X86 Register
             Base            ebx
             Heap Pointer    edi
             Stack Pointer   ebp
             R1              esi


  define f ghc_cc (Base, Hp, Sp, R1) {
    …
    tail call g ghc_cc (Base, Hp’, Sp’, R1’);
    return void;
  }
Handling the STG Registers
• Issue: If implemented naively then all the STG registers
 have a live range of the entire function.

• Some of the STG registers can never be scratched (e.g
 Sp, Hp…) but many can (e.g R2, R3…).

• We need to somehow tell LLVM when we no longer care
 about an STG register, otherwise it will spill and reload the
 register across calls to C land for example.
Handling the STG Registers
• We handle this by storing undef into the STG register
 when it is no longer needed. We manually scratch them.
define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) {
  …
  store undef %R2
  store undef %R3
  store undef %R4
  call c_cc sin(double %f22);
  …
  tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’);
  return void;
}
Handling Tables-Next-To-Code
   Un-optimised Layout        Optimised Layout




                         How to implement in LLVM?
Handling Tables-Next-To-Code
Use GNU Assembler sub-section feature.
• Allows code/data to be put into numbered sub-section
• Sub-sections are appended together in order
• Table in <n>, entry code in <n+1>
                    .text 12
                    sJ8_info:
                        movl ...
                        movl ...
                        jmp ...

                    [...]
                    .text 11
                    sJ8_info_itable:
                        .long ...
                        .long 0
                        .long 327712
Handling Tables-Next-To-Code
        LLVM Mangler                     C Mangler

• 180 lines of Haskell (half   • 2,000 lines of Perl
  is documentation)            • Needed for every platform
• Needed only for OS X
Evaluation: Simplicity
         Code Size         LLVM
25000                      • Half of code is representation of
                             LLVM language
20000
                           C
15000                      • Compiler: 1,100 lines
                           • C Headers: 2,000 lines

10000                      • Perl Mangler: 2,000 lines


 5000                      NCG
                           • Shared component: 8,000 lines
   0                       • Platform specific: 4,000 – 5,000
        LLVM   C     NCG     for X86, SPARC, PowerPC
Evaluation: Performance
                                      Run-time against LLVM (%)
Nofib:                           3
• Egalitarian benchmark
  suite, everything is equal   2.5

• Memory bound, little           2
  room for optimisation
  once at Cmm stage            1.5

                                 1

                               0.5

                                 0
                                           NCG            C
                               -0.5
Repa Performance
                   runtimes (s)
16

14

12

10

 8                                      LLVM
                                        NCG
 6

 4

 2

 0
     Matrix Mult   Laplace        FFT
Compile Times, Object Sizes
      Compile Times Vs         Object Sizes Vs LLVM
           LLVM           0
60                                 NCG         C
                          -2
40
                          -4
20
                          -6
 0
        NCG         C     -8
-20
                         -10
-40
                         -12
-60

-80                      -14
Result
• LLVM Backend is simpler.


• LLVM Backend is as fast or faster.


• LLVM developers now work for GHC!
Get It
• LLVM
   • Our calling convention has been accepted upstream!
   • Included in LLVM since version 2.7


                             http://llvm.org
• GHC
  • In HEAD
  • Should be released in GHC 7.0




• Send me any programs that are slower!
Questions?
Why from Cmm?
A lot less work then from STG/Core

But…

Couldn’t you do a better job from STG/Core?

Doubtful…

Easier to fix any deficiencies in Cmm representation and
code generator
Dealing with SSA
LLVM language is SSA form:
• Each variable can only be assigned to once
• Immutable


How do we handle converting mutable Cmm variables?
• Allocate a stack slot for each Cmm variable
• Use load and stores for reads and writes
• Use ‘mem2reg’ llvm optimisation pass
 • This converts our stack allocation to LLVM variables instead that
   properly obeys the SSA requirement
Type Systems?
LLVM language has a fairly high level type system
• Strings, Arrays, Pointers…


When combined with SSA form, great for development
• 15 bug fixes required after backend finished to get test
  suite to pass
• 10 of those were motivated by type system errors
• Some could have been fairly difficult (e.g returning pointer
  instead of value)

Más contenido relacionado

La actualidad más candente

ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 releaseLuba Tang
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32HANCOM MDS
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...Masashi Shibata
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityLinaro
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVELinaro
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
LCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLinaro
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunningguest1f2740
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
Qtpvbscripttrainings
QtpvbscripttrainingsQtpvbscripttrainings
QtpvbscripttrainingsRamu Palanki
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRay Song
 
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD ChapterDBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD ChapterAndrei Nikolaenko
 
Debugging With GNU Debugger GDB
Debugging With GNU Debugger GDBDebugging With GNU Debugger GDB
Debugging With GNU Debugger GDBkyaw thiha
 
New Process/Thread Runtime
New Process/Thread Runtime	New Process/Thread Runtime
New Process/Thread Runtime Linaro
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Anne Nicolas
 

La actualidad más candente (20)

ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Lustre Best Practices
Lustre Best Practices Lustre Best Practices
Lustre Best Practices
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVE
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
LCU14 209- LLVM Linux
LCU14 209- LLVM LinuxLCU14 209- LLVM Linux
LCU14 209- LLVM Linux
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
Qtpvbscripttrainings
QtpvbscripttrainingsQtpvbscripttrainings
Qtpvbscripttrainings
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLD
 
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD ChapterDBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
 
Debugging With GNU Debugger GDB
Debugging With GNU Debugger GDBDebugging With GNU Debugger GDB
Debugging With GNU Debugger GDB
 
New Process/Thread Runtime
New Process/Thread Runtime	New Process/Thread Runtime
New Process/Thread Runtime
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 

Similar a Haskell Symposium 2010: An LLVM backend for GHC

Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulHostedbyConfluent
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegKieran Kunhya
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differencesJean-Philippe BEMPEL
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdfssuser3fb50b
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachjemin lee
 
Scale Out Your Graph Across Servers and Clouds with OrientDB
Scale Out Your Graph Across Servers and Clouds  with OrientDBScale Out Your Graph Across Servers and Clouds  with OrientDB
Scale Out Your Graph Across Servers and Clouds with OrientDBLuca Garulli
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLinaro
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Anna Shymchenko
 

Similar a Haskell Symposium 2010: An LLVM backend for GHC (20)

Os Lattner
Os LattnerOs Lattner
Os Lattner
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
AVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpegAVX512 assembly language in FFmpeg
AVX512 assembly language in FFmpeg
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Clr jvm implementation differences
Clr jvm implementation differencesClr jvm implementation differences
Clr jvm implementation differences
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 
Scale Out Your Graph Across Servers and Clouds with OrientDB
Scale Out Your Graph Across Servers and Clouds  with OrientDBScale Out Your Graph Across Servers and Clouds  with OrientDB
Scale Out Your Graph Across Servers and Clouds with OrientDB
 
Værktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmerVærktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmer
 
LCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC sessionLCA14: LCA14-412: GPGPU on ARM SoC session
LCA14: LCA14-412: GPGPU on ARM SoC session
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
Java Memory Model
Java Memory ModelJava Memory Model
Java Memory Model
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Lev
LevLev
Lev
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
 

Último

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Alexander Turgeon
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 

Último (20)

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 

Haskell Symposium 2010: An LLVM backend for GHC

  • 1. AN LLVM BACKEND FOR GHC David Terei & Manuel Chakravarty University of New South Wales Haskell Symposium 2010 Baltimore, USA, 30th Sep
  • 2. What is ‘An LLVM Backend’? • Low Level Virtual Machine • Backend Compiler framework • Designed to be used by compiler developers, not by software developers • Open Source • Heavily sponsored by Apple
  • 3. Motivation • Simplify • Reduce ongoing work • Outsource! • Performance • Improve run-time
  • 4. Example collat :: Int -> Word32 -> Int Run-time (ms) collat c 1 = c 2000 collat c n | even n = 1800 collat (c+1) $ n `div` 2 1600 | otherwise = 1400 collat (c+1) $ 3 * n + 1 1200 1000 pmax x n = x `max` (collat 1 n, n) 800 600 main = print $ foldl pmax (1,1) 400 [2..1000000] 200 0 LLVM C NCG Different, updated run-times compared to paper
  • 5. Competitors C Backend (C) Native Code Generator (NCG) • GNU C Dependency • Huge amount of work • Badly supported on • Very limited portability platforms such as Windows • Does very little • Use a Mangler on optimisation work assembly code • Slow compilation speed • Takes twice as long as the NCG
  • 6. GHC’s Compilation Pipeline C Core STG Cmm NCG LLVM Language, Abstract Machine, Machine Configuration Heap, Stack, Registers
  • 7. Compiling to LLVM Won’t be covering, see paper for full details: • Why from Cmm and not from STG/Core • LLVM, C-- & Cmm languages • Dealing with LLVM’s SSA form • LLVM type system Will be covering: • Handling the STG Registers • Handling GHC’s Table-Next-To-Code optimisation
  • 8. Handling the STG Registers STG Register X86 Register Implement either by: Base ebx • In memory Heap Pointer edi • Pin to hardware registers Stack Pointer ebp R1 esi NCG? • Register allocator permanently stores STG registers in hardware C Backend? • Uses GNU C extension (global register variables) to also permanently store STG registers in hardware
  • 9. Handling the STG Registers LLVM handles by implementing a new calling convention: STG Register X86 Register Base ebx Heap Pointer edi Stack Pointer ebp R1 esi define f ghc_cc (Base, Hp, Sp, R1) { … tail call g ghc_cc (Base, Hp’, Sp’, R1’); return void; }
  • 10. Handling the STG Registers • Issue: If implemented naively then all the STG registers have a live range of the entire function. • Some of the STG registers can never be scratched (e.g Sp, Hp…) but many can (e.g R2, R3…). • We need to somehow tell LLVM when we no longer care about an STG register, otherwise it will spill and reload the register across calls to C land for example.
  • 11. Handling the STG Registers • We handle this by storing undef into the STG register when it is no longer needed. We manually scratch them. define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) { … store undef %R2 store undef %R3 store undef %R4 call c_cc sin(double %f22); … tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’); return void; }
  • 12. Handling Tables-Next-To-Code Un-optimised Layout Optimised Layout How to implement in LLVM?
  • 13. Handling Tables-Next-To-Code Use GNU Assembler sub-section feature. • Allows code/data to be put into numbered sub-section • Sub-sections are appended together in order • Table in <n>, entry code in <n+1> .text 12 sJ8_info: movl ... movl ... jmp ... [...] .text 11 sJ8_info_itable: .long ... .long 0 .long 327712
  • 14. Handling Tables-Next-To-Code LLVM Mangler C Mangler • 180 lines of Haskell (half • 2,000 lines of Perl is documentation) • Needed for every platform • Needed only for OS X
  • 15. Evaluation: Simplicity Code Size LLVM 25000 • Half of code is representation of LLVM language 20000 C 15000 • Compiler: 1,100 lines • C Headers: 2,000 lines 10000 • Perl Mangler: 2,000 lines 5000 NCG • Shared component: 8,000 lines 0 • Platform specific: 4,000 – 5,000 LLVM C NCG for X86, SPARC, PowerPC
  • 16. Evaluation: Performance Run-time against LLVM (%) Nofib: 3 • Egalitarian benchmark suite, everything is equal 2.5 • Memory bound, little 2 room for optimisation once at Cmm stage 1.5 1 0.5 0 NCG C -0.5
  • 17. Repa Performance runtimes (s) 16 14 12 10 8 LLVM NCG 6 4 2 0 Matrix Mult Laplace FFT
  • 18. Compile Times, Object Sizes Compile Times Vs Object Sizes Vs LLVM LLVM 0 60 NCG C -2 40 -4 20 -6 0 NCG C -8 -20 -10 -40 -12 -60 -80 -14
  • 19. Result • LLVM Backend is simpler. • LLVM Backend is as fast or faster. • LLVM developers now work for GHC!
  • 20. Get It • LLVM • Our calling convention has been accepted upstream! • Included in LLVM since version 2.7 http://llvm.org • GHC • In HEAD • Should be released in GHC 7.0 • Send me any programs that are slower!
  • 22. Why from Cmm? A lot less work then from STG/Core But… Couldn’t you do a better job from STG/Core? Doubtful… Easier to fix any deficiencies in Cmm representation and code generator
  • 23. Dealing with SSA LLVM language is SSA form: • Each variable can only be assigned to once • Immutable How do we handle converting mutable Cmm variables? • Allocate a stack slot for each Cmm variable • Use load and stores for reads and writes • Use ‘mem2reg’ llvm optimisation pass • This converts our stack allocation to LLVM variables instead that properly obeys the SSA requirement
  • 24. Type Systems? LLVM language has a fairly high level type system • Strings, Arrays, Pointers… When combined with SSA form, great for development • 15 bug fixes required after backend finished to get test suite to pass • 10 of those were motivated by type system errors • Some could have been fairly difficult (e.g returning pointer instead of value)