As more and more open-source technologies penetrate enterprises, data scientists have a plethora of choices for building, testing and scaling models. In addition, data scientists have been able to leverage the growing support for cloud-based infrastructure and open data sets to develop machine learning applications. Even though there are multiple solutions and platforms available to build machine learning solutions, challenges remain in adopting machine learning in the enterprise. Many of the challenges are associated with how machine learning process can be formalized. As the field matures, formal mechanism for a replicable, interpretable, auditable process for a complete machine learning pipeline from data ingestion to deployment is warranted. Projects like Docker, Binderhub, MLFlow are efforts in this quest and research and industry efforts on replicable machine learning processes are gaining steam. Heavily regulated industries like financial and healthcare industries are looking for best practices to enable their research teams to reproduce research and adopt best practices in model governance. In this talk, we will discuss the challenges and best practices of governing AI and ML model in the enterprise
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Model governance in the age of data science & AI
1. Model Governance
in the age of Data Science and AI
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.quantuniversity.com
10/23/2018
PRMIA Seminar
Suffolk University
Boston
2. 2
About us:
• Data Science, Quant Finance and
Model Governance Advisory
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform
3. 3
• Your challenge is to design an artificial intelligence and machine
learning (AI/ML) framework capable of flying a drone through
several professional drone racing courses without human
intervention or navigational pre-programming.
AlphaPilot Drone AI Challenge
13. 13
Model Verification is defined as:
“The process of determining that a model or simulation implementation and its
associated data accurately represent the developer’s conceptual description and
specifications”.
Model Validation is defined as:
“The process of determining the degree to which a model or simulation and its
associated data are an accurate representation of the real world from the
perspective of the intended uses of the model”.
Ref: DoD Modeling and Simulation (M&S) Verification, Validation, and
Accreditation (VV&A), DoDInstruction 5000.61, December 9, 2009.
Model Verification vs Validation
17. 17
The Rise of Big Data and Data Science
Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg
18. 18
Smarter Algorithms
Parallel and Distributing Computing Frameworks Deep Learning Frameworks
1. Our labeled datasets were thousands of times too
small.
2. Our computers were millions of times too slow.
3. We initialized the weights in a stupid way.
4. We used the wrong type of non-linearity.
- Geoff Hinton
“Capital One was able to determine fraudulent credit
card applications in 100 milliseconds”*
* http://go.databricks.com/hubfs/pdfs/Databricks-for-FinTech-170306.pdf
20. 20
The Machine Learning Process
Data
cleansing
Feature
Engineering
Training and
Testing
Model
building
Model
selection
Model
Deployment
21. 21
NLP pipeline
Data Ingestion
from Edgar
Pre-Processing
Invoking APIs to
label data
Compare APIs
Build a new
model for
sentiment
Analysis
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
External REST APIs
• Amazon Comprehend API
• Google API
• Watson API
• Azure API
22. 22
The Machine Learning Process
Data
cleansing
Feature
Engineering
Training and
Testing
Model
building
Model
selection
Model
Deployment
25. 25
Data Engineering vs Data Science
Engineering/IT
• Scaling
• Structuring
• Design of Experiments
• Data Parallel/Task Parallel
Quants/Data Scientists
• New Algorithms
• Try new methods
• Effect of Parameters and
Hyper Parameters
27. 27
Claim:
• Machine learning is good for fraud
detection, looking for arbitrage
opportunities and trade execution
Caution:
• Beware of imbalanced class problems
• A model that gives 99% accuracy may still
not be good enough
1. Machine learning is not a generic solution to all problems
28. 28
Claim:
• Our models work on all the
datasets we have tested on
Caution:
• Do we have enough data?
• How do we handle bias in
datasets?
• Beware of overfitting
• Historical Analysis is not
Prediction
2. A prototype model is not your production model
29. 29
AI and Machine Learning in Production
https://www.itnews.com.au/news/hsbc-societe-generale-run-
into-ais-production-problems-477966
Kristy Roth from HSBC:
“It’s been somewhat easy - in a funny way - to
get going using sample data, [but] then you hit
the real problems,” Roth said.
“I think our early track record on PoCs or pilots
hides a little bit the underlying issues.
Matt Davey from Societe Generale:
“We’ve done quite a bit of work with RPA
recently and I have to say we’ve been a bit
disillusioned with that experience,”
“the PoC is the easy bit: it’s how you get that
into production and shift the balance”
30. 30
Claim:
• It works. We don’t know how!
Caution:
• Lots of heuristics; still not a proven
science
• Interpretability or Auditability of
models are important
• Beware of black boxes; Transparency in
codebase is paramount with the
proliferation of opensource tools
• Skilled data scientists who are
knowledgeable about algorithms and
their appropriate usage are key to
successful adoption
3. We are just getting started!
31. 31
Claim:
• Machine Learning models are
more accurate than
traditional models
Caution:
• Is accuracy the right metric?
• How do we evaluate the
model? RMS or R2
• How does the model behave
in different regimes?
4. Choose the right metrics for evaluation
32. 32
Claim:
• Machine Learning and AI will replace
humans in most applications
Caution:
• Beware of the hype!
• Just because it worked some times
doesn’t mean that the organization can
be on autopilot
• Will we have true AI or Augmented
Intelligence?
• Model risk and robust risk
management is paramount to the
success of the organization.
• We are just getting started!
5. Are we there yet?
https://www.bloomberg.com/news/articles/2017-10-20/automation-
starts-to-sweep-wall-street-with-tons-of-glitches
33. 33
1. A need for a clearly defined Model Verification and Validation
framework applicable to your organization is required.
2. Define replicability, interpretability and auditability requirements
upfront.
3. Distinguish process automation, machine learning and
autonomous decision making using AI
4. Machine learning is not magic; Hire the right talent prior to
deploying models into production
5. Model lifecycle management shouldn’t be an afterthought
6. Define and address risks evolving from the adoption of new
processes
Summary
34. 34
QuantUniversity’s Model Risk related whitepapers published in the Wilmott Magazine
Email me at sri@quantuniversity.com for a copy
37. Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.QuantUniversity.com
www.analyticscertificate.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
37
38. • Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
38