The talk was given at OReilly Strata Data Conference September 2018 in NYC
All the conferences and thought leaders have been painting a vision of the businesses of the future being powered by data, but if we’re honest with ourselves, the vast majority of our massive data science investments are being deployed to PowerPoint or maybe a business dashboard. Productionizing your machine learning (ML) portfolio is the next big step on the path to ROI from AI.
You probably started out years ago on a “big data” initiative: You collected and cleaned your data and built data warehouses, and when those filled up you upgraded to data lakes. You hired data engineers and data scientists, and around the organization, everyone brushed up their SQL querying skills and got some licenses to Tableau and PowerBI.
Then you saw what Google, Uber, Facebook, and Amazon were doing with machine learning to automate business processes and customer interactions. To not get broadsided, you hired more data scientists and machine learning engineers. They were put on your teams and started using your big data investments to train models. But what you probably found is that your tech stack and DevOps processes don’t fit ML models. Unlike most of your systems, ML models require short spikes of massive compute; they are often written in different languages than your core code; they need different hardware to perform well; one model probably has applications across many teams; and the people making the models often don’t have the engineering experience to write production code but need to iterate faster than traditional engineers. Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.
There is a path forward. Almost five years ago Algorithmia launched a marketplace for models, functions, and algorithms. Today 65,000 developers are on the platform deploying 4,500 models—the result has been a layer of tools and best practices to make deploying ML models frictionless, scalable, and low maintenance. The company refers to it as the “AI layer.”
Drawing on this experience, Diego Oppenheimer covers the strategic and technical hurdles each company must overcome and the best practices developed while deploying over 4,000 ML models for 70,000 engineers.
Best practices for your organization
Continuous model deployment
Varying languages (Your code base probably isn’t in Python or R, but your ML models probably are.)
Managing your portfolio of ML models
Enabling models across your organization
Analytics on how and where models are being used