Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Deploying ML models in the enterprise

1.249 visualizaciones

Publicado el

The talk was given at OReilly Strata Data Conference September 2018 in NYC

All the conferences and thought leaders have been painting a vision of the businesses of the future being powered by data, but if we’re honest with ourselves, the vast majority of our massive data science investments are being deployed to PowerPoint or maybe a business dashboard. Productionizing your machine learning (ML) portfolio is the next big step on the path to ROI from AI.

You probably started out years ago on a “big data” initiative: You collected and cleaned your data and built data warehouses, and when those filled up you upgraded to data lakes. You hired data engineers and data scientists, and around the organization, everyone brushed up their SQL querying skills and got some licenses to Tableau and PowerBI.

Then you saw what Google, Uber, Facebook, and Amazon were doing with machine learning to automate business processes and customer interactions. To not get broadsided, you hired more data scientists and machine learning engineers. They were put on your teams and started using your big data investments to train models. But what you probably found is that your tech stack and DevOps processes don’t fit ML models. Unlike most of your systems, ML models require short spikes of massive compute; they are often written in different languages than your core code; they need different hardware to perform well; one model probably has applications across many teams; and the people making the models often don’t have the engineering experience to write production code but need to iterate faster than traditional engineers. Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.

There is a path forward. Almost five years ago Algorithmia launched a marketplace for models, functions, and algorithms. Today 65,000 developers are on the platform deploying 4,500 models—the result has been a layer of tools and best practices to make deploying ML models frictionless, scalable, and low maintenance. The company refers to it as the “AI layer.”

Drawing on this experience, Diego Oppenheimer covers the strategic and technical hurdles each company must overcome and the best practices developed while deploying over 4,000 ML models for 70,000 engineers.

Topics include:

Best practices for your organization
Continuous model deployment
Varying languages (Your code base probably isn’t in Python or R, but your ML models probably are.)
Managing your portfolio of ML models
Standardize versioning
Enabling models across your organization
Analytics on how and where models are being used
Maintaining auditability

Publicado en: Tecnología
  • Inicia sesión para ver los comentarios

Deploying ML models in the enterprise

  1. 1. Deploying machine learning models in the enterprise Strata Data Conference NYC Diego Oppenheimer, CEO
  2. 2. About Me Diego Oppenheimer - Founder and CEO - Algorithmia ● Product developer, entrepreneur, extensive background in all things data. ● Microsoft: PowerPivot, PowerBI, Excel and SQL Server. ● Founder of algorithmic trading startup ● BS/MS Carnegie Mellon University
  3. 3. Make state-of-the-art algorithms discoverable and accessible to everyone.
  4. 4. 4 AI/ML scalable infrastructure on demand + marketplace ● Function-as-a-service for Machine & Deep Learning ● Discoverable, live inventory of AI ● Monetizable ● Composable ● Every developer on earth can make their app intelligent
  5. 5. “There’s an algorithm for that!” 77K DEVELOPERS 6.4K algorithms, models and functions
  6. 6. 6 What does production mean for us ● ~6,400 algorithms,models, functions (50k w/ different versions) ● Each model: 1 to 1,000 calls a second, fluctuates, no devops ● ~15ms overhead latency ● Accessible in any of 14 languages (through SDKs) ● Any runtime, any architecture
  7. 7. 7 ALGORITHMIA ENTERPRISE Algorithmia Enterprise is an organization’s internal inventory of intelligence and a algorithm-as-a- service platform Deploy Write your function or model in any programming language, framework, or infrastructure. Scale Expose your model as a highly-reliable versioned REST API that automatically scales from one to hundreds of requests per second. Discover Name and describe your model, making it available in a central catalog where your peers can easily discover and reuse it. Monitor House thousands of models under one roof with a uniform REST interface and a single cluster monitoring dashboard.
  9. 9. 9 ● Challenges of Deploying Models in the Enterprise ● Characteristics of AI and Technologies ● Varying languages ● Standardize versioning ● Continuous model deployment ● Managing your portfolio of ML models ● Analytics on how and where models are being used ● Maintaining auditability ● Best practices for your organization What we will cover
  10. 10. 10 Challenges of deploying models in the enterprise ● Machine learning ○ CPU/GPU/Specialized hardware ○ Multiple frameworks, languages, dependencies ○ Called from different devices/architectures ● “Snowflake” environments ○ Unique cloud hardware and services ● Security and Audit ○ Stringent security and access controls ○ “Who called what when” for audit and compliance ● Uncharted territory ○ Not a lot of literature ○ Deployment for data science teams is a new problem ○ Many teams have not bought software or dealt with their own infrastructure teams. ○ Chargebacks and billing
  11. 11. • Two distinct phases: training and inference • Lots of processing power • Heterogeneous hardware (CPUs, GPUs, TPUs, etc.) • Limited by compute rather than bandwidth • “Tensorflow is open source, scaling it is not.” - Kenny Daniel Characteristics of AI/ML
  12. 12. 12 Metal or VM Containers Kubernetes INFERENCE Short compute bursts Elastic Stateless Multiple users OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful Single user OWNER: Data Scientists Technologies
  13. 13. MICROSERVICES: the design of a system as independently deployable, loosely coupled services. Two system design paradigms that work well ADVANTAGES • Maintainability • Scalability • Rolling deployments • Elastic • Software/Hardware agnostic SERVERLESS: the encapsulation, starting, and stopping of singular functions per request, with a just-in-time-compute model. ADVANTAGES • Cost / Efficiency • Concurrency built-in • Speed of development • Improved latency
  14. 14. 14 Runtime Abstraction Support any programming language or framework, including interoperability between mixed stacks. Elastic Scale Prioritize and automatically optimize execution of concurrent short-lived jobs. Cloud Abstraction Provide portability to algorithms, including public clouds or private clouds. Discoverability, Authentication, Instrumentation, etc. Shell & Services Kernel Think of your technology stack as an OS for running AI in your enterprise
  15. 15. 15 Multiple frameworks, languages, dependencies ● Models are rarely developed in the language they will be consumed in. ● In large enterprises there is rarely a standard language for software development ● The goal should be to make models consumable by any part of the organization on any platform: ○ To ensure the max value extracted from the model ○ To ensure model re-use ○ To ensure the fastest time from lab to production ● APIs and well developed SDKs are your best friend. ○ APIs allow for easy testing if a model is adequate. ○ APIs allow for easy consumption. ○ Well developed APIs have built in versioning. Which takes us to the next step....
  16. 16. 16 Standardize versioning ● Versioning is an extremely important part of deploying models in the enterprise. ● Starting thinking about models as any other piece of modular software: ○ Must be versioned ○ Versioned must be tracked ○ Older versions should be accessible (for rollbacks, acceptance testing,etc). For the Data Scientist: ● Ability to compare two different versions of the model is key. ○ Not only at training and verification time but also to understand performance and SLA changes. ● Model drift. For the Application Developers: ● Ability to match acceptance testing to predetermined developer cycles. ● Ability to stay behind a version. ● Avoid performance issues even if the model accuracy is better.
  17. 17. 17 Standardize versioning ● Even better, make your system auto-version. ● Borrow the best practices from the Software Development world. ● Rolling non-interruptive deployments
  18. 18. 18 Standardize Documentation ● Just like any API, they are only as good as their documentation. ● Make the documentation travel and be updated with the model to ensure its always up to date. ○ Directly in markdown inside the git that contains the model. ○ As an artifact that travels with the model.
  19. 19. 19 Continuous deployment ● Just like with standardizing versioning, Continuous Integration and Continuous Deployment for AI/ML should borrow from best practices of software development. ● The fastest path is usually the best (git push -> deploy). ○ Git + Docker + API generation makes this really easy. ● Don’t forget dependency management! ● Some interesting use cases: ○ Continuous training and deployment ○ Human in the loop training and deployment ○ Bespoke training and deployment to central platform
  20. 20. Continuous deployment - Human in the Loop
  21. 21. Continuous deployment - Single train platform to Deployment
  22. 22. Continuous deployment - Multiple training frameworks to deployment
  23. 23. 23 Managing a model portfolio “We have 14 versions of imagic magik running as services for image resizing before feeding into a number of different models. ” Dev Manager Analytics Platform - F500 Media Company ● Similar to an API strategy as new models become available you need to start caring about how to find them, who can use them and how to bill for them. ● Borrowing from the concept of an API Gateway and API registry the same paradigms work for model management and distribution. ● A common, centralized registry will offer the ability to find what has already been created and potentially re-use it.
  24. 24. 24 Managing a model portfolio ● Centralized repository/registry of models that can be accessed across organization. ● Encourage reuse (many preprocessing and post processing functions should only be built once). ● Finding existing models is key for experimenting with different pipelines.
  25. 25. 25 Managing a model portfolio ● Centralization of models allows for understanding business impact and usage across the organization. ● C-level understanding of AI/ML investments working or not. ● Finding existing models is key for experimenting with different pipelines and for rapid application development by disparate teams or external developers. ● Security and Access controls so that only the right people in the organization can access the models. ○ Who can view how the model works vs Who can call upon it.
  26. 26. 26 Model Analytics During the training phase analytics such as accuracy, drift, errors rates are very important. When deploying models inside an enterprise a different slice of analytics is required. What is important during deployment and production: ● Latency ● Resources used (CPU/GPU, I/O) ● System Capacity ● Scale up and Scale down ● Authentication ● API timing metrics and calls ● Errors rates But also… ● What teams are using the models ● What applications are using them
  27. 27. 27 Model Auditability and Compliance You enterprise deployment system should be able to answer: “Who called what model, when and with what data.” Why is this important: ● Compliance: ○ Regulated industries need to provide this information to government regulators: ■ Financial Services ■ Life Sciences ■ Federal Government ● C-level understanding of AI/ML investments working or not. ● Debugging production systems ● Billing ○ Complete understanding of who is using what models allows for chargebacks
  28. 28. 28 Best practices and conclusion "Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.” ● Technology: ○ Borrow from the best practices of software development and deployment of applications and code at scale. ■ CI/CD, Versioning, API design, etc. ○ The most advanced AI/ML companies in the world are centralizing their deployment and serving platforms under one roof. This is because the influence of data science and ML teams across your organization will only grow. ○ Understand that training and serving have very different profiles and different technology choices will need to be made. ○ Seriously consider using microservices and serverless as design patterns for re-use, scale and modularity.
  29. 29. 29 Best practices and conclusion ● Organization: ○ Production and serving will usually be owned DevOps or Enterprise Architecture. Success in deployment in the enterprise will dicated by understanding the roles and responsibilities of these teams. ○ Data science teams tend to be new with limited experience in: ■ Purchasing enterprise software ■ Requirements and considerations for production environments ■ IT requirements around Information Security, Compliance and Support. It’s crucial for success to educate and guide these teams through the enterprise requirements.
  30. 30. 30 Best practices and conclusion ● Future proofing: ○ Think re-use: many models will be interesting and usable to multiple parts of the enterprise. Discoverability and accessibility become key. ○ Safe to assume that your number of models will only grow in time. How you will manage them in the future requires having a conversation early in the process. ○ Your AI/ML model portfolio is an extremely important asset - understanding the value you are getting from them is important to measurement and influence should be tracked. ○ When deciding to build vs buy think of your teams capacity to adapt and move at the pace the AI/ML industry is moving at over time.
  32. 32. Diego Oppenheimer CEO Thank you! @doppenhe