SlideShare a Scribd company logo
1 of 51
Pyston tech talk
November 10, 2015
What is Pyston
High-performance Python JIT, written in C++
JIT: produces assembly “just in time” in order to accelerate the program
Targets Python 2.7
Open source project at Dropbox, started in 2013
Two full time members, plus part time and open source members
The team
Marius Wachtler
Kevin Modzelewski
Lots of important contributors:
Boxiang Sun, Rudi Chen, Travis Hance,
Michael Arntzenius, Vinzenz Feenstra, Daniel Agar
Pyston current status
25% better performance than CPython
Compatibility level is roughly the same as between minor versions (2.6 vs 2.7)
- Can run django, much of the Dropbox server, some numpy
Next milestone is Dropbox production!
Talk Outline
Pyston motivation
Compatibility
Python performance
Our techniques
Current roadmap
Pyston motivation
Why Pyston
Python is not just “IO-bound”; at scale, Dropbox (and others) have many cores
running Python
Many existing Python-performance projects, but not suitable for large Python
codebases
Existing Landscape
Baseline: CPython
If you want more performance:
- C extension
- Cython
- Numba
- PyPy
- Rewrite (Go? C++?)
How we fit in
Focus on large web-app case (specifically Dropbox):
- Require very low required edits per kLOC
- Implies good C API support
- Good performance scalability to large codebases
Non-goal: crushing microbenchmarks
Compatibility
Compatibility challenges
Some things expected:
- Language documentation but no formal spec
- C API challenges
- Every feature exists because someone wanted it
Compatibility challenges
Some not expected:
- Lots of program introspection
- Some core libraries (pip) are the most dynamic
- Code will break if you fix even the worst warts
- Community accepts other implementations, but assumes
is_cpython = not is_pypy
Our evolution
Started as a from-scratch implementation, is now CPython-based.
Got to experiment with many things:
- showed us several things we can change
- and several things we cannot :(
Evolution result
We use lots of CPython code to be “correct by default”
We support:
- django, sqlalchemy, lxml, many more
- most of the Dropbox server
- some numpy
Aside: the GIL
I don’t want it either but… it’s not just an implementation challenge.
- Removing it is a much bigger compatibility break than we can accept
We have a GIL. And Dropbox has already solved its Python parallelism issue anyway.
Maybe Python 4?
Python performance
What makes Python hard
Beating an interpreter sounds easy (lots of research papers do it!), but:
CPython is well-optimized, and code is optimized to run on it
Hard to gracefully degrade to CPython’s behavior
What makes Python hard
Python doesn’t have static types
But…
What makes Python hard
Python doesn’t have static types
But…
Statically typed Python is still hard!
What makes Python hard
Statically-typed Python is still hard
var_name = var_parser_regex.match(s)
setting = getattr(settings, var_name, None)
What makes Python hard
Statically-typed Python is still hard
Knowing the types does not make getattr() easy to evaluate
var_name = var_parser_regex.match(s)
setting = getattr(settings, var_name, None)
What makes Python hard
Statically-typed Python is still hard
Knowing the types does not make getattr() easy to evaluate
Many other examples:
- len()
- constructors
- binops
var_name = var_parser_regex.match(s)
setting = getattr(settings, var_name, None)
What makes Python hard
- Types are only the first level of dynamicism
- Functions themselves exhibit dynamic behavior
- Traditional “interpreter overhead” is negligible
So what can we get from a JIT?
What makes Python hard
- Types are only the first level of dynamicism
- Functions themselves exhibit dynamic behavior
- Traditional “interpreter overhead” is negligible
So what can we get from a JIT?
- We need to understand + avoid the dynamicism in the runtime
Our techniques
Pyston architecture
Parser Bytecode Interpreter
Baseline
JIT
LLVM JIT
Runtime
Tracer
Our workhorse: tracing
Very low tech tracing JIT:
- single operation (bytecode) at a time
- no inlining
- manual annotations in the runtime
Our workhorse: tracing
Manual annotations
- are difficult to write
+ require less engineering investment
+ are very flexible
+ have very high performance potential
Tracing example def foo(x):
pass
foo(1)
Tracing example
1.Verify the function is the same
2.Call it
def foo(x):
pass
foo(1)
Tracing example
1.Verify the function is the same
a.Check if “foo” still refers to the same object
b.Check if foo() was mutated
2.Call it
a.Arrange arguments for C-style function call
b.Call the underlying function pointer
def foo(x):
pass
foo(1)
Tracing example
1.Verify the function is the same
a.Check if “foo” still refers to the same object
b.Check if foo() was mutated
2.Call it
a.Arrange arguments for C-style function call
b.Call the underlying function pointer
def foo(x):
pass
foo(1)
Can skip hash table lookup
Rare, use invalidation
Can skip *args allocation
Tracing example #2 o = MyCoolObject()
len(o)
Tracing example #2
1.Verify the function is the same
a.Check if “len” refers to the same object
2.Call it
a.len() supports tracing
o = MyCoolObject()
len(o)
Tracing example #2
1.Verify the function is the same
a.Check if “len” refers to the same object
2.Call it
a.len() supports tracing. Decides to:
i.Call arg.__len__()
o = MyCoolObject()
len(o)
Tracing example #2
1.Verify the function is the same
a.Check if “len” refers to the same object
2.Call it
a.len() supports tracing. Decides to:
i.Call arg.__len__()
1.Verify the function is the same
2.Call it
o = MyCoolObject()
len(o)
Tracing example #2
1.Verify the function is the same
a.Check if “len” refers to the same object
2.Call it
a.len() supports tracing. Decides to:
i.Call arg.__len__()
1.Verify the function is the same
...
2.Call it
o = MyCoolObject()
len(o)
Why use tracing
We started with a traditional method-at-a-time JIT, but quickly ran into issues, and our
tracing system kept being the best way to solve them.
- We need a rich way of representing the expected path through the runtime
- We want to let C functions specify alternate versions of themselves that are either
more specialized or more general
- We want to keep the tracing code close to the runtime code it needs to match
PyPy comparison
PyPy
Missing:
- C extension support (80k LOC used at Dropbox)
- performance scalability and consistency
We’ve been measuring our catch-up in “years per month”
PyPy performance scalability
Their performance degrades quite a lot when run on large “real” (non-numeric)
applications, and often ends up slower than CPython
- Initial testing of PyPy at Dropbox shows no clear improvement
One indicator: average benchmark size.
- PyPy: 36 lines
- Pyston: 671 lines
PyPy performance scalability
Simple attribute-lookup example:
PyPy performance scalability
Simple attribute-lookup example:
PyPy performance scalability
Simple attribute-lookup example:
PyPy performance scalability
Simple attribute-lookup example:
8x faster!
PyPy performance scalability
Simple attribute-lookup example:
38x slower :(
8x faster!
Current roadmap
Current roadmap
Focusing on getting ready for Dropbox’s production use. Last “1%” features
- Inspecting exited frames
- Signals support
- Refcounting?
Current roadmap
Continue performance work
- Integrate tracing and LLVM JITs
- Optimized bytecode interpreter
- Function inlining
How to get involved
Just pick something! We have a good list of starter projects
Or just hop on our gitter channel and say hi
Questions?
kmod@dropbox.com
marius@dropbox.com
https://github.com/dropbox/pyston
https://gitter.im/dropbox/pyston
We’re hiring!

More Related Content

Recently uploaded

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 

Recently uploaded (20)

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Pyston talk 11-10-15

  • 2. What is Pyston High-performance Python JIT, written in C++ JIT: produces assembly “just in time” in order to accelerate the program Targets Python 2.7 Open source project at Dropbox, started in 2013 Two full time members, plus part time and open source members
  • 3. The team Marius Wachtler Kevin Modzelewski Lots of important contributors: Boxiang Sun, Rudi Chen, Travis Hance, Michael Arntzenius, Vinzenz Feenstra, Daniel Agar
  • 4. Pyston current status 25% better performance than CPython Compatibility level is roughly the same as between minor versions (2.6 vs 2.7) - Can run django, much of the Dropbox server, some numpy Next milestone is Dropbox production!
  • 5. Talk Outline Pyston motivation Compatibility Python performance Our techniques Current roadmap
  • 7. Why Pyston Python is not just “IO-bound”; at scale, Dropbox (and others) have many cores running Python Many existing Python-performance projects, but not suitable for large Python codebases
  • 8. Existing Landscape Baseline: CPython If you want more performance: - C extension - Cython - Numba - PyPy - Rewrite (Go? C++?)
  • 9. How we fit in Focus on large web-app case (specifically Dropbox): - Require very low required edits per kLOC - Implies good C API support - Good performance scalability to large codebases Non-goal: crushing microbenchmarks
  • 11. Compatibility challenges Some things expected: - Language documentation but no formal spec - C API challenges - Every feature exists because someone wanted it
  • 12. Compatibility challenges Some not expected: - Lots of program introspection - Some core libraries (pip) are the most dynamic - Code will break if you fix even the worst warts - Community accepts other implementations, but assumes is_cpython = not is_pypy
  • 13. Our evolution Started as a from-scratch implementation, is now CPython-based. Got to experiment with many things: - showed us several things we can change - and several things we cannot :(
  • 14. Evolution result We use lots of CPython code to be “correct by default” We support: - django, sqlalchemy, lxml, many more - most of the Dropbox server - some numpy
  • 15. Aside: the GIL I don’t want it either but… it’s not just an implementation challenge. - Removing it is a much bigger compatibility break than we can accept We have a GIL. And Dropbox has already solved its Python parallelism issue anyway. Maybe Python 4?
  • 17. What makes Python hard Beating an interpreter sounds easy (lots of research papers do it!), but: CPython is well-optimized, and code is optimized to run on it Hard to gracefully degrade to CPython’s behavior
  • 18. What makes Python hard Python doesn’t have static types But…
  • 19. What makes Python hard Python doesn’t have static types But… Statically typed Python is still hard!
  • 20. What makes Python hard Statically-typed Python is still hard var_name = var_parser_regex.match(s) setting = getattr(settings, var_name, None)
  • 21. What makes Python hard Statically-typed Python is still hard Knowing the types does not make getattr() easy to evaluate var_name = var_parser_regex.match(s) setting = getattr(settings, var_name, None)
  • 22. What makes Python hard Statically-typed Python is still hard Knowing the types does not make getattr() easy to evaluate Many other examples: - len() - constructors - binops var_name = var_parser_regex.match(s) setting = getattr(settings, var_name, None)
  • 23. What makes Python hard - Types are only the first level of dynamicism - Functions themselves exhibit dynamic behavior - Traditional “interpreter overhead” is negligible So what can we get from a JIT?
  • 24. What makes Python hard - Types are only the first level of dynamicism - Functions themselves exhibit dynamic behavior - Traditional “interpreter overhead” is negligible So what can we get from a JIT? - We need to understand + avoid the dynamicism in the runtime
  • 26. Pyston architecture Parser Bytecode Interpreter Baseline JIT LLVM JIT Runtime Tracer
  • 27. Our workhorse: tracing Very low tech tracing JIT: - single operation (bytecode) at a time - no inlining - manual annotations in the runtime
  • 28. Our workhorse: tracing Manual annotations - are difficult to write + require less engineering investment + are very flexible + have very high performance potential
  • 29. Tracing example def foo(x): pass foo(1)
  • 30. Tracing example 1.Verify the function is the same 2.Call it def foo(x): pass foo(1)
  • 31. Tracing example 1.Verify the function is the same a.Check if “foo” still refers to the same object b.Check if foo() was mutated 2.Call it a.Arrange arguments for C-style function call b.Call the underlying function pointer def foo(x): pass foo(1)
  • 32. Tracing example 1.Verify the function is the same a.Check if “foo” still refers to the same object b.Check if foo() was mutated 2.Call it a.Arrange arguments for C-style function call b.Call the underlying function pointer def foo(x): pass foo(1) Can skip hash table lookup Rare, use invalidation Can skip *args allocation
  • 33. Tracing example #2 o = MyCoolObject() len(o)
  • 34. Tracing example #2 1.Verify the function is the same a.Check if “len” refers to the same object 2.Call it a.len() supports tracing o = MyCoolObject() len(o)
  • 35. Tracing example #2 1.Verify the function is the same a.Check if “len” refers to the same object 2.Call it a.len() supports tracing. Decides to: i.Call arg.__len__() o = MyCoolObject() len(o)
  • 36. Tracing example #2 1.Verify the function is the same a.Check if “len” refers to the same object 2.Call it a.len() supports tracing. Decides to: i.Call arg.__len__() 1.Verify the function is the same 2.Call it o = MyCoolObject() len(o)
  • 37. Tracing example #2 1.Verify the function is the same a.Check if “len” refers to the same object 2.Call it a.len() supports tracing. Decides to: i.Call arg.__len__() 1.Verify the function is the same ... 2.Call it o = MyCoolObject() len(o)
  • 38. Why use tracing We started with a traditional method-at-a-time JIT, but quickly ran into issues, and our tracing system kept being the best way to solve them. - We need a rich way of representing the expected path through the runtime - We want to let C functions specify alternate versions of themselves that are either more specialized or more general - We want to keep the tracing code close to the runtime code it needs to match
  • 40. PyPy Missing: - C extension support (80k LOC used at Dropbox) - performance scalability and consistency We’ve been measuring our catch-up in “years per month”
  • 41. PyPy performance scalability Their performance degrades quite a lot when run on large “real” (non-numeric) applications, and often ends up slower than CPython - Initial testing of PyPy at Dropbox shows no clear improvement One indicator: average benchmark size. - PyPy: 36 lines - Pyston: 671 lines
  • 42. PyPy performance scalability Simple attribute-lookup example:
  • 43. PyPy performance scalability Simple attribute-lookup example:
  • 44. PyPy performance scalability Simple attribute-lookup example:
  • 45. PyPy performance scalability Simple attribute-lookup example: 8x faster!
  • 46. PyPy performance scalability Simple attribute-lookup example: 38x slower :( 8x faster!
  • 48. Current roadmap Focusing on getting ready for Dropbox’s production use. Last “1%” features - Inspecting exited frames - Signals support - Refcounting?
  • 49. Current roadmap Continue performance work - Integrate tracing and LLVM JITs - Optimized bytecode interpreter - Function inlining
  • 50. How to get involved Just pick something! We have a good list of starter projects Or just hop on our gitter channel and say hi

Editor's Notes

  1. Other companies that use Python: YouTube, Pinterest, Reddit. (Yelp, Venmo, Digg)
  2. sys._getframe, skip type readying; exceptions, GC
  3. (negligible on web-app workloads) ceval.c is 5k LOC out of 40k in Python/. Cython is about the same speed as CPython when used without annotations.
  4. (negligible on web-app workloads) ceval.c is 5k LOC out of 40k in Python/. Cython is about the same speed as CPython when used without annotations.
  5. 80kloc figure was from March ‘15 what do I mean by consistency? they tend to have worse perf on real large programs.
  6. There are other dimensions as well. another one I tried is number of different control flow paths. 512 seems like a lot, but studies on open source projects found up to 300-some types, and the Dropbox codebase is much larger. (think: ORM code which will process #types=#prod tables)