SlideShare una empresa de Scribd logo
1 de 128
High Performance JavaScript: JITs in Firefox
Power of the Web
How did we get here? ,[object Object]
Adding new technologies, like video, audio, integration and 2D+3D drawing,[object Object]
Getting a Webpage www.intel.com 192.198.164.158
Getting a Webpage www.intel.com 192.198.164.158 GET / HTTP/1.0 (HTTP Request)
Getting a Webpage <html><head><title>Laptop, Notebooks, ...</title></head><script language="javascript">functiondoStuff() {...}</script><body>... (HTML, CSS, JavaScript)
Core Web Components ,[object Object]
CSS provides styling
JavaScript provides a programming interface,[object Object]
Video, Audio tags
WebGL (OpenGL + JavaScript),[object Object]
CSS Cascading Style Sheets Separates presentation from content <b class=”warning”>This is red</b> .warning {color: red;text-decoration: underline;}
What is JavaScript? ,[object Object]
De facto language of the web
Interacts with the DOM and the browser,[object Object]
Displaying a Webpage Layout engine applies CSS to DOM Computes geometry of elements JavaScript engine executes code This can change DOM, triggers “reflow” Finished layout is sent to graphics layer
The Result
JavaScript ,[object Object]
Initial version written in 10 days
Anything from small browser interactions, complex apps (Gmail) to intense graphics (WebGL),[object Object]
Think LISP or Self, not Java Untyped - no type declarations Multi-Paradigm – objects, closures, first-class functions Highly dynamic - objects are dictionaries
No Type Declarations ,[object Object],functionf(a) {var x =72.3;if (a)        x = a +"string";return x;}
No Type Declarations ,[object Object],if a is:    number: add;    string: concat; functionf(a) {var x =“hi”;if (a)        x = a +33.2;return x;}
Functional ,[object Object],functionf(a) {returnfunction () {return a;    }}var m = f(5);print(m());
Objects ,[object Object]
Properties may be deleted or added at any time!var point = { x : 5, y : 10 };deletepoint.x;point.z=12;
Prototypes Every object can have a prototype object If a property is not found on an object, its prototype is searched instead … And the prototype’s prototype, etc..
Numbers ,[object Object]
Engines use 32-bit integers to optimize
Must preserve semantics: integers overflow to doubles,[object Object]
JavaScript Implementations ,[object Object]
Garbage collector
Runtime library
Execution engine (VM),[object Object]
Values ,[object Object]
Unboxed values required for computation,[object Object]
Easy idea: 96-bit struct
32-bits for type
64-bits for double, pointer, or integer
Too big! We have to pack better.,[object Object]
[object Object]
Doubles are normal IEEE-754
How to pack non-doubles?
51 bits of NaN space!IA32, ARM Nunboxing
Nunboxing IA32, ARM Type Payload 0x400c0000 0x00000000 63 0 ,[object Object]
Type is double because it’s not a NaN
Encodes: (Double, 3.5),[object Object]
Value is in NaN space
Type is 0xFFFF0001(Int32)
Value is (Int32, 0x00000040),[object Object]
Value is in NaN space
Bits >> 47 == 0x1FFFF == INT32
Bits 47-63 masked off to retrieve payload
Value(Int32, 0x00000040),[object Object]
Garbage Collection Need to reclaim memory without pausing user workflow, animations, etc Need very fast object allocation Consider lots of small objects like points, vectors Reading and writing to the heap must be fast
Mark and Sweep All live objects are found via a recursive traversal of the root value set All dead objects are added to a free list Very slow to traverse entire heap Building free lists can bring “dead” memory into cache
Improvements ,[object Object]
Marking can be incremental, interleaved with program execution,[object Object]
Generational GC Inactive Active Newborn Space 8MB 8MB Tenured Space
After Minor GC Active Inactive Newborn Space 8MB 8MB Tenured Space
Barriers Incremental marking, generational GC need read or write barriers Read barriers are much slower Azul Systems has hardware read barrier Write barrier seems to be well-predicted?
Unknowns How much do cache effects matter? Memory GC has to touch Locality of objects What do we need to consider to make GC fast?
Running JavaScript ,[object Object]
Basic JIT – Simple, untyped compiler
Advanced JIT – Typed compiler, lightspeed,[object Object]
Giant switch loop
Handles all edge cases of JS semantics,[object Object]
Interpreter case OP_ADD: {        Value lhs = POP();        Value rhs = POP();        Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right))                result.setInt32(left + right);elseresult.setNumber(double(left) + double(right));        } elseif (lhs.isString() || rhs.isString()) {            String *left = ValueToString(lhs);            String *right = ValueToString(rhs);            String *r = Concatenate(left, right);result.setString(r);        } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right);        }PUSH(result);break;
Interpreter Very slow! Lots of opcode and type dispatch Lots of interpreter stack traffic Just-in-time compilation solves both
Just In Time Compilation must be very fast Can’t introduce noticeable pauses People care about memory use Can’t JIT everything Can’t create bloated code May have to discard code at any time
Basic JIT JägerMonkey in Firefox 4 Every opcode has a hand-coded template of assembly (registers left blank) Method at a time: Single pass through bytecode stream! Compiler uses assembly templates corresponding to each opcode
Bytecode GETARG 0 ; fetch x GETARG 1 ; fetch y ADD RETURN functionAdd(x, y) {return x + y;}
Interpreter case OP_ADD: {        Value lhs = POP();        Value rhs = POP();        Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right))                result.setInt32(left + right);elseresult.setNumber(double(left) + double(right));        } elseif (lhs.isString() || rhs.isString()) {            String *left = ValueToString(lhs);            String *right = ValueToString(rhs);            String *r = Concatenate(left, right);result.setString(r);        } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right);        }PUSH(result);break;
Assembling ADD ,[object Object]
Observation:
Some input types much more common than others,[object Object]
Assembling ADD ,[object Object]
Large design space
Can consider integers common, or
Integers and doubles, or
Anything!,[object Object]
Basic JIT - ADD if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add; R0 = arg0.data R1 = arg1.data Greedy Register Allocator
Basic JIT - ADD if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add; R0 = arg0.data;R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED)gotoslow_add;
Slow Paths slow_add:    Value result = runtime::Add(arg0, arg1);
Inline Out-of-line if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1;  if (OVERFLOWED)gotoslow_add;rejoin: slow_add:    Value result = Interpreter::Add(arg0, arg1); R2 = result.data; goto rejoin;
Rejoining Inline Out-of-line   if (arg0.type != INT32) gotoslow_add;  if (arg1.type != INT32)gotoslow_add;  R0 = arg0.data;  R1 = arg1.data;  R2 = R0 + R1;  if (OVERFLOWED)gotoslow_add; R3 = TYPE_INT32;rejoin: slow_add:    Value result = runtime::Add(arg0, arg1);     R2 = result.data;     R3 = result.type; goto rejoin;
Final Code Inline Out-of-line if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32)goto slow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1;  if (OVERFLOWED)goto slow_add; R3 = TYPE_INT32;rejoin: return Value(R3, R2); slow_add:    Value result = Interpreter::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;
Observations Even though we have fast paths, no type information can flow in between opcodes Cannot assume result of ADD is integer Means more branches Types increase register pressure
Basic JIT – Object Access ,[object Object]
How can we create a fast-path?
We could try to cache the lookup, but
We want it to work for >1 objectsfunctionf(x) {returnx.y;}
Object Access Observation Usually, objects flowing through an access site look the same If we knew an object’s layout, we could bypass the dictionary lookup
Object Layout var obj = { x: 50,              y: 100,            z: 20     };
Object Layout
Object Layout varobj = { x: 50,              y: 100,            z: 20     }; … varobj2 = { x: 78,              y: 93,              z: 600     };
Object Layout
Familiar Problems We don’t know an object’s shape during JIT compilation ... Or even that a property access receives an object! Solution: leave code “blank,” lazily generate it later
Inline Caches functionf(x) {returnx.y;}
Inline Caches ,[object Object],if (arg0.type != OBJECT)gotoslow_property; JSObject *obj = arg0.data;
[object Object], if (arg0.type != OBJECT)gotoslow_property; JSObject *obj = arg0.data; gotoproperty_ic; Inline Caches
Property ICs functionf(x) {returnx.y;} f({x: 5, y: 10, z: 30});
Property ICs
Property ICs
Property ICs
Inline Caches ,[object Object], if (arg0.type != OBJECT)gotoslow_property; JSObject *obj = arg0.data; gotoproperty_ic;
Inline Caches ,[object Object], if (arg0.type != OBJECT)gotoslow_property; JSObject *obj = arg0.data; gotoproperty_ic;
Inline Caches ,[object Object],  if (arg0.type != OBJECT)gotoslow_property; JSObject *obj = arg0.data;   if (obj->shape != SHAPE1) gotoproperty_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;
Polymorphism ,[object Object]
Add another fast-path...functionf(x) {returnx.y;}f({ x: 5, y: 10, z: 30});f({ y: 40});
Polymorphism Main Method  if (arg0.type != OBJECT)gotoslow_path;JSObject *obj = arg0.data;if (obj->shape != SHAPE1)gotoproperty_ic;R0 = obj->slots[1].type;R1 = obj->slots[1].data;rejoin:
Polymorphism Main Method Generated Stub   if (arg0.type != OBJECT)gotoslow_path;JSObject *obj = arg0.data;  if (obj->shape != SHAPE0)gotoproperty_ic;  R0 = obj->slots[1].type;  R1 = obj->slots[1].data;rejoin:   stub: if (obj->shape != SHAPE2)gotoproperty_ic; R0 = obj->slots[0].type;R1 = obj->slots[0].data; goto rejoin;
Polymorphism Main Method Generated Stub if (arg0.type != OBJECT)    goto slow_path;  JSObject *obj = arg0.data;if (obj->shape != SHAPE1)goto stub;  R0 = obj->slots[1].type;  R1 = obj->slots[1].data;rejoin:   stub: if (obj->shape != SHAPE2)goto property_ic; R0 = obj->slots[0].type;R1 = obj->slots[0].data; goto rejoin;
Chain of Stubs if(obj->shape == S1) {     result = obj->slots[0]; } else { if (obj->shape == S2) {         result = obj->slots[1];     } else { if (obj->shape == S3) {              result = obj->slots[2];         } else { if (obj->shape == S4) {                 result = obj->slots[3];             } else { goto property_ic;             }         }     } }
Code Memory ,[object Object]
Self-modifying, but single threaded
Code memory is always rwx
Concerned about protection-flipping expense,[object Object]
Inline caches adapt code based on runtime observations
Lots of guards, poor type information,[object Object]
We could generate an IR, but without type information...
Slow paths prevent most optimizations.,[object Object]
Code Motion? functionAdd(x, n) { varsum = 0; vartemp0 = x + n;for (vari = 0; i < n; i++)         sum += temp0;     return sum;} Let’s try it.
Wrong Result var global =0;varobj= {valueOf: function () {return++global;    }}Add(obj, 10); ,[object Object]
Hoisted version returns 110!,[object Object]
Ideal Scenario ,[object Object]
Remove slow paths

Más contenido relacionado

La actualidad más candente

The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Positive Hack Days
 

La actualidad más candente (20)

Fun with Lambdas: C++14 Style (part 2)
Fun with Lambdas: C++14 Style (part 2)Fun with Lambdas: C++14 Style (part 2)
Fun with Lambdas: C++14 Style (part 2)
 
The Style of C++ 11
The Style of C++ 11The Style of C++ 11
The Style of C++ 11
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
Introduction to Objective - C
Introduction to Objective - CIntroduction to Objective - C
Introduction to Objective - C
 
Fun with Lambdas: C++14 Style (part 1)
Fun with Lambdas: C++14 Style (part 1)Fun with Lambdas: C++14 Style (part 1)
Fun with Lambdas: C++14 Style (part 1)
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
 
BayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore Haskell
 
Scientific Python
Scientific PythonScientific Python
Scientific Python
 
Java Script Workshop
Java Script WorkshopJava Script Workshop
Java Script Workshop
 
Cocoaheads Meetup / Alex Zimin / Swift magic
Cocoaheads Meetup / Alex Zimin / Swift magicCocoaheads Meetup / Alex Zimin / Swift magic
Cocoaheads Meetup / Alex Zimin / Swift magic
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
 
Oh Crap, I Forgot (Or Never Learned) C! [CodeMash 2010]
Oh Crap, I Forgot (Or Never Learned) C! [CodeMash 2010]Oh Crap, I Forgot (Or Never Learned) C! [CodeMash 2010]
Oh Crap, I Forgot (Or Never Learned) C! [CodeMash 2010]
 
What's New in C++ 11?
What's New in C++ 11?What's New in C++ 11?
What's New in C++ 11?
 
DEFUN 2008 - Real World Haskell
DEFUN 2008 - Real World HaskellDEFUN 2008 - Real World Haskell
DEFUN 2008 - Real World Haskell
 
Functional programming in C++
Functional programming in C++Functional programming in C++
Functional programming in C++
 
Memory efficient pytorch
Memory efficient pytorchMemory efficient pytorch
Memory efficient pytorch
 
Using Parallel Computing Platform - NHDNUG
Using Parallel Computing Platform - NHDNUGUsing Parallel Computing Platform - NHDNUG
Using Parallel Computing Platform - NHDNUG
 
Software Design in Practice (with Java examples)
Software Design in Practice (with Java examples)Software Design in Practice (with Java examples)
Software Design in Practice (with Java examples)
 
C language
C languageC language
C language
 

Similar a Intel JIT Talk

Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
Romain Francois
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
Romain Francois
 

Similar a Intel JIT Talk (20)

Lo Mejor Del Pdc2008 El Futrode C#
Lo Mejor Del Pdc2008 El Futrode C#Lo Mejor Del Pdc2008 El Futrode C#
Lo Mejor Del Pdc2008 El Futrode C#
 
Code optimization
Code optimization Code optimization
Code optimization
 
Code optimization
Code optimization Code optimization
Code optimization
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
C++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerC++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical Reviewer
 
C Programming Language
C Programming LanguageC Programming Language
C Programming Language
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
devLink - What's New in C# 4?
devLink - What's New in C# 4?devLink - What's New in C# 4?
devLink - What's New in C# 4?
 
C tutorial
C tutorialC tutorial
C tutorial
 
C tutorial
C tutorialC tutorial
C tutorial
 
C tutorial
C tutorialC tutorial
C tutorial
 
Monads in Swift
Monads in SwiftMonads in Swift
Monads in Swift
 
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x AdditionsVHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
 
How much performance can you get out of Javascript? - Massimiliano Mantione -...
How much performance can you get out of Javascript? - Massimiliano Mantione -...How much performance can you get out of Javascript? - Massimiliano Mantione -...
How much performance can you get out of Javascript? - Massimiliano Mantione -...
 
掀起 Swift 的面紗
掀起 Swift 的面紗掀起 Swift 的面紗
掀起 Swift 的面紗
 
Category theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) DataCategory theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) Data
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Intel JIT Talk

  • 3.
  • 4.
  • 5. Getting a Webpage www.intel.com 192.198.164.158
  • 6. Getting a Webpage www.intel.com 192.198.164.158 GET / HTTP/1.0 (HTTP Request)
  • 7. Getting a Webpage <html><head><title>Laptop, Notebooks, ...</title></head><script language="javascript">functiondoStuff() {...}</script><body>... (HTML, CSS, JavaScript)
  • 8.
  • 10.
  • 12.
  • 13. CSS Cascading Style Sheets Separates presentation from content <b class=”warning”>This is red</b> .warning {color: red;text-decoration: underline;}
  • 14.
  • 15. De facto language of the web
  • 16.
  • 17. Displaying a Webpage Layout engine applies CSS to DOM Computes geometry of elements JavaScript engine executes code This can change DOM, triggers “reflow” Finished layout is sent to graphics layer
  • 19.
  • 21.
  • 22. Think LISP or Self, not Java Untyped - no type declarations Multi-Paradigm – objects, closures, first-class functions Highly dynamic - objects are dictionaries
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Properties may be deleted or added at any time!var point = { x : 5, y : 10 };deletepoint.x;point.z=12;
  • 28. Prototypes Every object can have a prototype object If a property is not found on an object, its prototype is searched instead … And the prototype’s prototype, etc..
  • 29.
  • 30. Engines use 32-bit integers to optimize
  • 31.
  • 32.
  • 35.
  • 36.
  • 37.
  • 40. 64-bits for double, pointer, or integer
  • 41.
  • 42.
  • 44. How to pack non-doubles?
  • 45. 51 bits of NaN space!IA32, ARM Nunboxing
  • 46.
  • 47. Type is double because it’s not a NaN
  • 48.
  • 49. Value is in NaN space
  • 51.
  • 52. Value is in NaN space
  • 53. Bits >> 47 == 0x1FFFF == INT32
  • 54. Bits 47-63 masked off to retrieve payload
  • 55.
  • 56. Garbage Collection Need to reclaim memory without pausing user workflow, animations, etc Need very fast object allocation Consider lots of small objects like points, vectors Reading and writing to the heap must be fast
  • 57. Mark and Sweep All live objects are found via a recursive traversal of the root value set All dead objects are added to a free list Very slow to traverse entire heap Building free lists can bring “dead” memory into cache
  • 58.
  • 59.
  • 60. Generational GC Inactive Active Newborn Space 8MB 8MB Tenured Space
  • 61. After Minor GC Active Inactive Newborn Space 8MB 8MB Tenured Space
  • 62. Barriers Incremental marking, generational GC need read or write barriers Read barriers are much slower Azul Systems has hardware read barrier Write barrier seems to be well-predicted?
  • 63. Unknowns How much do cache effects matter? Memory GC has to touch Locality of objects What do we need to consider to make GC fast?
  • 64.
  • 65. Basic JIT – Simple, untyped compiler
  • 66.
  • 68.
  • 69. Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right)) result.setInt32(left + right);elseresult.setNumber(double(left) + double(right)); } elseif (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right);result.setString(r); } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right); }PUSH(result);break;
  • 70. Interpreter Very slow! Lots of opcode and type dispatch Lots of interpreter stack traffic Just-in-time compilation solves both
  • 71. Just In Time Compilation must be very fast Can’t introduce noticeable pauses People care about memory use Can’t JIT everything Can’t create bloated code May have to discard code at any time
  • 72. Basic JIT JägerMonkey in Firefox 4 Every opcode has a hand-coded template of assembly (registers left blank) Method at a time: Single pass through bytecode stream! Compiler uses assembly templates corresponding to each opcode
  • 73. Bytecode GETARG 0 ; fetch x GETARG 1 ; fetch y ADD RETURN functionAdd(x, y) {return x + y;}
  • 74. Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result;if (lhs.isInt32() && rhs.isInt32()) {int left = rhs.toInt32();int right = rhs.toInt32();if (AddOverflows(left, right, left + right)) result.setInt32(left + right);elseresult.setNumber(double(left) + double(right)); } elseif (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right);result.setString(r); } else {double left = ValueToNumber(lhs);double right = ValueToNumber(rhs);result.setDouble(left + right); }PUSH(result);break;
  • 75.
  • 77.
  • 78.
  • 82.
  • 83. Basic JIT - ADD if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add; R0 = arg0.data R1 = arg1.data Greedy Register Allocator
  • 84. Basic JIT - ADD if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add; R0 = arg0.data;R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED)gotoslow_add;
  • 85. Slow Paths slow_add: Value result = runtime::Add(arg0, arg1);
  • 86. Inline Out-of-line if (arg0.type != INT32) gotoslow_add;if (arg1.type != INT32)gotoslow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1; if (OVERFLOWED)gotoslow_add;rejoin: slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; goto rejoin;
  • 87. Rejoining Inline Out-of-line if (arg0.type != INT32) gotoslow_add; if (arg1.type != INT32)gotoslow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED)gotoslow_add; R3 = TYPE_INT32;rejoin: slow_add: Value result = runtime::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;
  • 88. Final Code Inline Out-of-line if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32)goto slow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1; if (OVERFLOWED)goto slow_add; R3 = TYPE_INT32;rejoin: return Value(R3, R2); slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;
  • 89. Observations Even though we have fast paths, no type information can flow in between opcodes Cannot assume result of ADD is integer Means more branches Types increase register pressure
  • 90.
  • 91. How can we create a fast-path?
  • 92. We could try to cache the lookup, but
  • 93. We want it to work for >1 objectsfunctionf(x) {returnx.y;}
  • 94. Object Access Observation Usually, objects flowing through an access site look the same If we knew an object’s layout, we could bypass the dictionary lookup
  • 95. Object Layout var obj = { x: 50, y: 100, z: 20 };
  • 97. Object Layout varobj = { x: 50, y: 100, z: 20 }; … varobj2 = { x: 78, y: 93, z: 600 };
  • 99. Familiar Problems We don’t know an object’s shape during JIT compilation ... Or even that a property access receives an object! Solution: leave code “blank,” lazily generate it later
  • 100. Inline Caches functionf(x) {returnx.y;}
  • 101.
  • 102.
  • 103. Property ICs functionf(x) {returnx.y;} f({x: 5, y: 10, z: 30});
  • 107.
  • 108.
  • 109.
  • 110.
  • 111. Add another fast-path...functionf(x) {returnx.y;}f({ x: 5, y: 10, z: 30});f({ y: 40});
  • 112. Polymorphism Main Method if (arg0.type != OBJECT)gotoslow_path;JSObject *obj = arg0.data;if (obj->shape != SHAPE1)gotoproperty_ic;R0 = obj->slots[1].type;R1 = obj->slots[1].data;rejoin:
  • 113. Polymorphism Main Method Generated Stub if (arg0.type != OBJECT)gotoslow_path;JSObject *obj = arg0.data; if (obj->shape != SHAPE0)gotoproperty_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin: stub: if (obj->shape != SHAPE2)gotoproperty_ic; R0 = obj->slots[0].type;R1 = obj->slots[0].data; goto rejoin;
  • 114. Polymorphism Main Method Generated Stub if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data;if (obj->shape != SHAPE1)goto stub; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin: stub: if (obj->shape != SHAPE2)goto property_ic; R0 = obj->slots[0].type;R1 = obj->slots[0].data; goto rejoin;
  • 115. Chain of Stubs if(obj->shape == S1) { result = obj->slots[0]; } else { if (obj->shape == S2) { result = obj->slots[1]; } else { if (obj->shape == S3) { result = obj->slots[2]; } else { if (obj->shape == S4) { result = obj->slots[3]; } else { goto property_ic; } } } }
  • 116.
  • 118. Code memory is always rwx
  • 119.
  • 120. Inline caches adapt code based on runtime observations
  • 121.
  • 122. We could generate an IR, but without type information...
  • 123.
  • 124. Code Motion? functionAdd(x, n) { varsum = 0; vartemp0 = x + n;for (vari = 0; i < n; i++) sum += temp0; return sum;} Let’s try it.
  • 125.
  • 126.
  • 127.
  • 129.
  • 130. Guesses must be informed
  • 131. Don’t want to waste time compiling code that can’t or won’t run
  • 132. Using type inference or runtime profiling
  • 133.
  • 134.
  • 135. Constructs high and low-level IRs
  • 136. Applies type information using runtime feedback
  • 137.
  • 138. Build SSA GETARG 0 ; fetch x GETARG 1 ; fetch y ADD GETARG 0 ; fetch x ADD RETURN v0 = arg0 v1 = arg1
  • 139. Build SSA GETARG 0 ; fetch x GETARG 1 ; fetch y ADD GETARG 0 ; fetch x ADD RETURN v0 = arg0 v1 = arg1
  • 140. Type Oracle Need a mechanism to inform compiler about likely types Type Oracle: given program counter, returns types for inputs and outputs May use any data source – runtime profiling, type inference, etc
  • 141. Type Oracle For this example, assume the type oracle returns “integer” for the output and both inputs to all ADD operations
  • 142. Build SSA GETARG 0 ; fetch x GETARG 1 ; fetch y ADD GETARG 0 ; fetch x ADD RETURN v0 = arg0 v1 = arg1 v2 = iadd(v0, v1)
  • 143. Build SSA GETARG 0 ; fetch x GETARG 1 ; fetch y ADD GETARG 0 ; fetch x ADD RETURN v0 = arg0 v1 = arg1 v2 = iadd(v0, v1) v3 = iadd(v2, v0)
  • 144. Build SSA GETARG 0 ; fetch x GETARG 1 ; fetch y ADD GETARG 0 ; fetch x ADD RETURN v0 = arg0 v1 = arg1 v2 = iadd(v0, v1) v3 = iadd(v2, v0) v4 = return(v3)
  • 145. SSA (Intermediate) v0 = arg0 v1 = arg1 v2 = iadd(v0, v1) v3 = iadd(v2, v0) v4 = return(v3) Untyped Typed
  • 146.
  • 148. Makes sure SSA inputs have correct type
  • 149.
  • 150. Optimization v0 = arg0 v1 = arg1 v2 = unbox(v0, INT32) v3 = unbox(v1, INT32) v4 = iadd(v2, v3) v5 = unbox(v0, INT32) v6 = iadd(v4, v5) v7 = return(v6)
  • 151. Optimization v0 = arg0 v1 = arg1 v2 = unbox(v0, INT32) v3 = unbox(v1, INT32) v4 = iadd(v2, v3) v5 = unbox(v0, INT32) v6 = iadd(v4, v5) v7 = return(v6)
  • 152. Optimized SSA v0 = arg0 v1 = arg1 v2 = unbox(v0, INT32) v3 = unbox(v1, INT32) v4 = iadd(v2, v3) v5 = iadd(v4, v2) v6 = return(v5)
  • 153. Register Allocation Linear Scan Register Allocation Based on Christian Wimmer’s work Linear Scan Register Allocation for the Java HotSpot™ Client Compiler. Master's thesis, Institute for System Software, Johannes Kepler University Linz, 2004 Linear Scan Register Allocation on SSA Form. In Proceedings of the International Symposium on Code Generation and Optimization, pages 170–179. ACM Press, 2010. Same algorithm used in Hotspot JVM
  • 154. Allocate Registers 0: stack arg0 1: stack arg1 2: r1 unbox(0, INT32) 3: r0 unbox(1, INT32) 4: r0 iadd(2, 3) 5: r0 iadd(4, 2) 6: r0 return(5)
  • 155. Code Generation SSA 0: stack arg0 1: stack arg1 2: r1 unbox(0, INT32) 3: r0 unbox(1, INT32) 4: r0 iadd(2, 3) 5: r0 iadd(4, 2) 6: r0 return(5)
  • 156. Code Generation SSA Native Code 0: stack arg0 1: stack arg1 2: r1unbox(0, INT32) 3: r0 unbox(1, INT32) 4: r0 iadd(2, 3) 5: r0 iadd(4, 2) 6: r0 return(5) if (arg0.type != INT32)goto BAILOUT; r1 = arg0.data;
  • 157. Code Generation SSA Native Code 0: stack arg0 1: stack arg1 2: r1 unbox(0, INT32) 3: r0unbox(1, INT32) 4: r0 iadd(2, 3) 5: r0 iadd(4, 2) 6: r0 return(5) if (arg0.type != INT32) goto BAILOUT; r1 = arg0.data; if (arg1.type != INT32)goto BAILOUT; r0 = arg1.data;
  • 158. Code Generation SSA Native Code 0: stack arg0 1: stack arg1 2: r0 unbox(0, INT32) 3: r1 unbox(1, INT32) 4: r0iadd(2, 3) 5: r0 iadd(4, 2) 6: r0 return(5) if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT;
  • 159. Code Generation SSA Native Code 0: stack arg0 1: stack arg1 2: r0 unbox(0, INT32) 3: r1 unbox(1, INT32) 4: r0 iadd(2, 3) 5: r0iadd(4, 2) 6: r0 return(5) if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT;
  • 160. Code Generation SSA Native Code 0: stack arg0 1: stack arg1 2: r0 unbox(0, INT32) 3: r1 unbox(1, INT32) 4: r0 iadd(2, 3) 5: r0 iadd(4, 2) 6: r0 return(4) if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; return r0;
  • 161. Generated Code if (arg0.type != INT32)goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32)goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT;return r0;
  • 162.
  • 163. May recompile function if (arg0.type != INT32)goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32)goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED)goto BAILOUT;return r0;
  • 164. Guards Still Needed Object shapes for reading properties Types of… Arguments Values read from the heap Values returned from C++
  • 165. Guards Still Needed Math Integer Overflow Divide by Zero Multiply by -0 NaN is falseyin JS, truthy in ISAs
  • 166.
  • 167. Can move and eliminate code
  • 168. If speculation fails, method is recompiled or deoptimized
  • 169.
  • 172. Better than Basic JIT, but we can do better
  • 173. Interval analysis can remove overflow checks
  • 174.
  • 175. Hybrid: both static and dynamic
  • 176. Replaces checks in JIT with checks in virtual machine
  • 177.
  • 178. Conclusions JITs getting more powerful Type specialization is key Still need checks and special cases on many operations

Notas del editor

  1. I’m going to start off by showing my favorite demo, that I think really captures what we’re enabling by making the browser as fast possible.
  2. So this is Doom, translated - actually compiled - from C++ to JavaScript, being played in Firefox. In addition to JavaScript it’s using the new 2D drawing element called Canvas. This demo is completely native to the browser. There’s no Flash, it’s all open web technologies. And it’s managing to get a pretty good framerate, 30fps, despite being mostly unoptimized. This is a really cool demo for me because, three years ago, this kind of demo on the web was just impossible without proprietary extensions. Some of the technology just wasn’t there, and the rest of it was anywhere from ten to a hundred times slower. So we’ve come a long way.… Unfortunately all I have of the demo is a recording, it was playable online for about a day but iD Software requested that we take it down.
  3. To step back for a minute, I’m going to briefly overview how the browser and the major web technologies interact, and then jump into JavaScript which is the heart of this talk.
  4. This is a much less gory example, a major company’s website in a popular web browser. The process of getting from the intel.com URL to this display is actually pretty complicated.
  5. The browser then sends intel.com this short plaintext message which says, “give me the webpage corresponding to the top of your website’s hierarchy.”
  6. How does the browser get from all of this to the a fully rendered website?
  7. Now that we’re familiar with the language’s main features, let’s talk about implementing a JavaScript engine. A JavaScript implementation has four main pieces
  8. One of the challenges of a JavaScript engine is dealing with the lack of type declarations. In order to dispatch the correct computation, the runtime must be able to query a variable or field’s current type.
  9. We’re done with data representation and garbage collection, so it’s time for the really fun stuff: running JavaScript code, and making it run fast. One way to run code, but not make it run fast, is to use an interpreter.