SlideShare una empresa de Scribd logo
1 de 15
Managing Python at scale
without breaking the bank
Michael (Misha) Tselman
PyData NY 2017
Agenda
• J.P. Morgan and Athena
• Objectives
• Continuous delivery
• Under the hood
• Challenges
• Conclusions
• Q&A
J.P. Morgan
• One of the world’s biggest banks
• $2.5 trillion assets
• $95 billion revenue
• Processing $5 trillion payments every day
• 230,000+ employees globally
• One of the world’s biggest tech companies
• 44,000+ employees in Technology
• $9.5 billion annual investment in technology and innovation
Athena
• Python-based Pricing, Trading, Risk Management, and Analytics
platform with tools for Data Science and Machine Learning
• Thousands of users across multiple business lines
• 1500+ Python developers use and contribute to the platform
• 150,000 python modules, 35 million lines of python code
• 500+ Python packages from the Open Source.
• Rapid development and deployment model that puts developers and
quants at the heart of the business.
Athena
Foundation
• Hydra ( globally replicated object database )
• Reactive Athena ( C++/Python reactive dataflow framework )
• Pixie Graph ( directed acyclic dependency graph )
• Athena Application framework based on QT
• Athena Web ( tornado, html5, websockets, javascript, web assembly )
• Job scheduling ( ~270,000 jobs daily which kick-off ~1M processes )
• Integration with Compute Grid ( tens of thousands of cores + GPUs )
Objectives
• Keep end-users and clients happy 
• Ensure robustness and stability of our production systems
• Keep developers productive and efficient
• Provide quants and data scientists with the best research tools
• Encourage sharing and global consistency across business lines
Approach
• Conceptually:
• Continuous delivery:
• 10,000 – 15,000 production changes every week.
• Full visibility of the entire code base. Anyone can contribute.
• Instant global deployment
• Under the hood:
• Globally replicated object databases for code (and data)
• Monorepo – Monolithic code base
• Extensively automated testing
Continuous delivery
Write code & tests Test Commit Ask for a bless Push Run
PROS
• Time to market
• User satisfaction
• Developer productivity
CONS
• Fear of change / stability
• High reliance on automation
• Tricky in distributed systems
10,000 - 15,000 modules pushed to production every week
Layering of changes / Effective runtime
Developer’s
layer
B3 C2
Shared staging /
UAT
A2
Effective
Runtime
A2 B3 C2 D1
B2
Production A1 B1 C1 D1
E1
E1
Alternatives to filesystem based source
DB-LDN DB-NYC DB-TKO
“lib.foo”
“
def hello():
print ‘world’
“lib.bar” def hello():
print ‘pydata’
“lib.bar @ 2017-10-01
12:33”
“
def hello():
print ‘jpmorgan’
“lib.bar @ 2017-09-21
10:16”
“...”
• Use globally replicated database
• Customize the importer
• SourceMarkers - Take advantage of transactions & timestamps
Python and Binary Runtime
prod old prod prod new
Python Source
C++ & 3rd party
Some Challenges
• Open source package upgrades
• API changes
• Change of pickled/stored representation
• Numerical changes
• Runtime/binary dependencies
• Limited branching
• Streamlines production
• Does not fit some research/experimental workflows
• Full reproducibility requires “freezing” all code including the binary train
Conclusions
• Python’s flexibility makes things easier
• Good integration tests ensure compatibility and consistency
• Modules don’t have to be loaded from a filesystem
• Production stability does not imply slow delivery and deployment
• Open source does not imply free
• Shared platform does not imply shared knowledge
References
• J.P. Morgan
http://www.jpmorgan.com/techcareers
• The motivation for a monolithic codebase
http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/fulltext
Q&A

Más contenido relacionado

La actualidad más candente

Everything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in KubernetesEverything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in KubernetesThe {code} Team
 
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...confluent
 
Cryptocurrencies and the Banking Sector
Cryptocurrencies and the Banking SectorCryptocurrencies and the Banking Sector
Cryptocurrencies and the Banking SectorIlan Alon
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorialLen Bass
 
The best way to run Elastic on Kubernetes
The best way to run Elastic on KubernetesThe best way to run Elastic on Kubernetes
The best way to run Elastic on KubernetesElasticsearch
 
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesDoKC
 
Devops - Microservice and Kubernetes
Devops - Microservice and KubernetesDevops - Microservice and Kubernetes
Devops - Microservice and KubernetesNodeXperts
 
ETH ASIC礦機以及簡易分辨方式.pptx
ETH ASIC礦機以及簡易分辨方式.pptxETH ASIC礦機以及簡易分辨方式.pptx
ETH ASIC礦機以及簡易分辨方式.pptxLouk Chi
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQLMike Crabb
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Keeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssemblyKeeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssemblyScyllaDB
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
ScyllaDB Cloud Goes Serverless
ScyllaDB Cloud Goes ServerlessScyllaDB Cloud Goes Serverless
ScyllaDB Cloud Goes ServerlessScyllaDB
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 

La actualidad más candente (20)

Everything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in KubernetesEverything You Need To Know About Persistent Storage in Kubernetes
Everything You Need To Know About Persistent Storage in Kubernetes
 
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
 
Cryptocurrencies and the Banking Sector
Cryptocurrencies and the Banking SectorCryptocurrencies and the Banking Sector
Cryptocurrencies and the Banking Sector
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Icsa2018 blockchain tutorial
Icsa2018 blockchain tutorialIcsa2018 blockchain tutorial
Icsa2018 blockchain tutorial
 
The best way to run Elastic on Kubernetes
The best way to run Elastic on KubernetesThe best way to run Elastic on Kubernetes
The best way to run Elastic on Kubernetes
 
RabbitMQ & Kafka
RabbitMQ & KafkaRabbitMQ & Kafka
RabbitMQ & Kafka
 
What we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on KubernetesWhat we've learned from running a PostgreSQL managed service on Kubernetes
What we've learned from running a PostgreSQL managed service on Kubernetes
 
Devops - Microservice and Kubernetes
Devops - Microservice and KubernetesDevops - Microservice and Kubernetes
Devops - Microservice and Kubernetes
 
ETH ASIC礦機以及簡易分辨方式.pptx
ETH ASIC礦機以及簡易分辨方式.pptxETH ASIC礦機以及簡易分辨方式.pptx
ETH ASIC礦機以及簡易分辨方式.pptx
 
A Beginners Guide to noSQL
A Beginners Guide to noSQLA Beginners Guide to noSQL
A Beginners Guide to noSQL
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
MicroServices on Azure
MicroServices on AzureMicroServices on Azure
MicroServices on Azure
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Keeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssemblyKeeping Latency Low for User-Defined Functions with WebAssembly
Keeping Latency Low for User-Defined Functions with WebAssembly
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
ScyllaDB Cloud Goes Serverless
ScyllaDB Cloud Goes ServerlessScyllaDB Cloud Goes Serverless
ScyllaDB Cloud Goes Serverless
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 

Similar a Managing python at scale without breaking the bank

Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBInfluxData
 
TiConf Australia 2013
TiConf Australia 2013TiConf Australia 2013
TiConf Australia 2013Jeff Haynie
 
GraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewGraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewNeo4j
 
New Technology for Modern Development Challenges
New Technology for Modern Development ChallengesNew Technology for Modern Development Challenges
New Technology for Modern Development ChallengesPerforce
 
Enabling application portability with the greatest of ease!
Enabling application portability with the greatest of ease!Enabling application portability with the greatest of ease!
Enabling application portability with the greatest of ease!Ken Owens
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayDataStax
 
Immutable Service Delivery Shenzhen 2016
Immutable Service Delivery   Shenzhen 2016Immutable Service Delivery   Shenzhen 2016
Immutable Service Delivery Shenzhen 2016John Willis
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Dataconomy Media
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Why cloud native matters
Why cloud native mattersWhy cloud native matters
Why cloud native mattersCheryl Hung
 
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name DenaliITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name DenaliITCamp
 
Devops a la sauce Microsoft
Devops a la sauce MicrosoftDevops a la sauce Microsoft
Devops a la sauce MicrosoftMicrosoft
 
Log Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNLLog Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNLElasticsearch
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldRandy Shoup
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Patrick McDonnell
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Michael Rembetsy
 
16370 cics project opening and project update f
16370  cics project opening and project update f16370  cics project opening and project update f
16370 cics project opening and project update fnick_garrod
 

Similar a Managing python at scale without breaking the bank (20)

Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBHow to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
 
TiConf Australia 2013
TiConf Australia 2013TiConf Australia 2013
TiConf Australia 2013
 
GraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewGraphTour - Neo4j Database Overview
GraphTour - Neo4j Database Overview
 
New Technology for Modern Development Challenges
New Technology for Modern Development ChallengesNew Technology for Modern Development Challenges
New Technology for Modern Development Challenges
 
Enabling application portability with the greatest of ease!
Enabling application portability with the greatest of ease!Enabling application portability with the greatest of ease!
Enabling application portability with the greatest of ease!
 
Digital transformation and AI @Edge
Digital transformation and AI @EdgeDigital transformation and AI @Edge
Digital transformation and AI @Edge
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each Day
 
Immutable Service Delivery Shenzhen 2016
Immutable Service Delivery   Shenzhen 2016Immutable Service Delivery   Shenzhen 2016
Immutable Service Delivery Shenzhen 2016
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Why cloud native matters
Why cloud native mattersWhy cloud native matters
Why cloud native matters
 
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name DenaliITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
 
Devops a la sauce Microsoft
Devops a la sauce MicrosoftDevops a la sauce Microsoft
Devops a la sauce Microsoft
 
Log Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNLLog Monitoring and Anomaly Detection at Scale at ORNL
Log Monitoring and Anomaly Detection at Scale at ORNL
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric World
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
 
16370 cics project opening and project update f
16370  cics project opening and project update f16370  cics project opening and project update f
16370 cics project opening and project update f
 

Más de PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Más de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Último

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Managing python at scale without breaking the bank

  • 1. Managing Python at scale without breaking the bank Michael (Misha) Tselman PyData NY 2017
  • 2. Agenda • J.P. Morgan and Athena • Objectives • Continuous delivery • Under the hood • Challenges • Conclusions • Q&A
  • 3. J.P. Morgan • One of the world’s biggest banks • $2.5 trillion assets • $95 billion revenue • Processing $5 trillion payments every day • 230,000+ employees globally • One of the world’s biggest tech companies • 44,000+ employees in Technology • $9.5 billion annual investment in technology and innovation
  • 4. Athena • Python-based Pricing, Trading, Risk Management, and Analytics platform with tools for Data Science and Machine Learning • Thousands of users across multiple business lines • 1500+ Python developers use and contribute to the platform • 150,000 python modules, 35 million lines of python code • 500+ Python packages from the Open Source. • Rapid development and deployment model that puts developers and quants at the heart of the business.
  • 5. Athena Foundation • Hydra ( globally replicated object database ) • Reactive Athena ( C++/Python reactive dataflow framework ) • Pixie Graph ( directed acyclic dependency graph ) • Athena Application framework based on QT • Athena Web ( tornado, html5, websockets, javascript, web assembly ) • Job scheduling ( ~270,000 jobs daily which kick-off ~1M processes ) • Integration with Compute Grid ( tens of thousands of cores + GPUs )
  • 6. Objectives • Keep end-users and clients happy  • Ensure robustness and stability of our production systems • Keep developers productive and efficient • Provide quants and data scientists with the best research tools • Encourage sharing and global consistency across business lines
  • 7. Approach • Conceptually: • Continuous delivery: • 10,000 – 15,000 production changes every week. • Full visibility of the entire code base. Anyone can contribute. • Instant global deployment • Under the hood: • Globally replicated object databases for code (and data) • Monorepo – Monolithic code base • Extensively automated testing
  • 8. Continuous delivery Write code & tests Test Commit Ask for a bless Push Run PROS • Time to market • User satisfaction • Developer productivity CONS • Fear of change / stability • High reliance on automation • Tricky in distributed systems 10,000 - 15,000 modules pushed to production every week
  • 9. Layering of changes / Effective runtime Developer’s layer B3 C2 Shared staging / UAT A2 Effective Runtime A2 B3 C2 D1 B2 Production A1 B1 C1 D1 E1 E1
  • 10. Alternatives to filesystem based source DB-LDN DB-NYC DB-TKO “lib.foo” “ def hello(): print ‘world’ “lib.bar” def hello(): print ‘pydata’ “lib.bar @ 2017-10-01 12:33” “ def hello(): print ‘jpmorgan’ “lib.bar @ 2017-09-21 10:16” “...” • Use globally replicated database • Customize the importer • SourceMarkers - Take advantage of transactions & timestamps
  • 11. Python and Binary Runtime prod old prod prod new Python Source C++ & 3rd party
  • 12. Some Challenges • Open source package upgrades • API changes • Change of pickled/stored representation • Numerical changes • Runtime/binary dependencies • Limited branching • Streamlines production • Does not fit some research/experimental workflows • Full reproducibility requires “freezing” all code including the binary train
  • 13. Conclusions • Python’s flexibility makes things easier • Good integration tests ensure compatibility and consistency • Modules don’t have to be loaded from a filesystem • Production stability does not imply slow delivery and deployment • Open source does not imply free • Shared platform does not imply shared knowledge
  • 14. References • J.P. Morgan http://www.jpmorgan.com/techcareers • The motivation for a monolithic codebase http://cacm.acm.org/magazines/2016/7/204032-why-google-stores- billions-of-lines-of-code-in-a-single-repository/fulltext
  • 15. Q&A

Notas del editor

  1. Ghost tests.
  2. At the bottom, a more traditional release train of binaries and 3rd party packages. prod.new continuously changing until lockdown. Allows for seamless testing of new features against the python baseline. Globally distributed and instantly available for import in any region for any user. Not just a repo, but a fully deployed codebase at the same time.