Financial firms are taking AI and machine learning seriously to augment traditional investment decision making. Alternative datasets including text analytics, cloud computing, algorithmic trading are game changers for many firms who are adopting technology at a rapid pace. As more and more open-source technologies penetrate enterprises, quants and data scientists have a plethora of choices for building, testing and scaling quantitative models. Even though there are multiple solutions and platforms available to build machine learning solutions, challenges remain in adopting machine learning in the enterprise.In this talk we will illustrate a step-by-step process to enable replicable AI/ML research within the enterprise using QuSandbox.
Adopting Data Science and Machine Learning in the financial enterprise
1. Adopting Data Science and Machine Learning in
the Enterprise
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com
2. 2
About us:
• Data Science, Quant Finance and
Machine Learning Startup
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform
3. • Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
3
9. 9
• “AI is the theory and development of computer systems able to
perform tasks that traditionally have required human intelligence.
• AI is a broad field, of which ‘machine learning’ is a sub-category”
What is Machine Learning and AI?
Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
10. 10
Machine Learning & AI in finance – A paradigm shift
Stochastic
Models
Factor Models
Optimization
Risk Factors
P/Q Quants
Derivative
pricing
Trading
Strategies
Simulations
Distribution
fitting
Quant
Real-time analytics
Predictive analytics
Machine Learning
RPA
NLP
Deep Learning
Computer Vision
Graph Analytics
Chatbots
Sentiment Analysis
Alternative Data
Data Scientist
12. 12
The rise of Big Data and Data Science
Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg
13. 13
Smarter Algorithms
Parallel and Distributing Computing Frameworks Deep Learning Frameworks
1. Our labeled datasets were thousands of times too
small.
2. Our computers were millions of times too slow.
3. We initialized the weights in a stupid way.
4. We used the wrong type of non-linearity.
- Geoff Hinton
“Capital One was able to determine fraudulent credit
card applications in 100 milliseconds”*
* http://go.databricks.com/hubfs/pdfs/Databricks-for-FinTech-170306.pdf
24. 24
Claim:
• Machine learning is better for fraud
detection, looking for arbitrage
opportunities and trade execution
Caution:
• Beware of imbalanced class problems
• A model that gives 99% accuracy may still
not be good enough
1. Machine learning is not a generic solution to all problems
25. 25
Claim:
• Our models work on
datasets we have tested on
Caution:
• Do we have enough data?
• How do we handle bias in
datasets?
• Beware of overfitting
• Historical Analysis is not
Prediction
2. A prototype model is not your production model
26. 26
AI and Machine Learning in Production
https://www.itnews.com.au/news/hsbc-societe-generale-run-
into-ais-production-problems-477966
Kristy Roth from HSBC:
“It’s been somewhat easy - in a funny way - to
get going using sample data, [but] then you hit
the real problems,” Roth said.
“I think our early track record on PoCs or pilots
hides a little bit the underlying issues.
Matt Davey from Societe Generale:
“We’ve done quite a bit of work with RPA
recently and I have to say we’ve been a bit
disillusioned with that experience,”
“the PoC is the easy bit: it’s how you get that
into production and shift the balance”
27. 27
Claim:
• It works. We don’t know how!
Caution:
• It’s still not a proven science
• Interpretability or “auditability” of
models is important
• Transparency in codebase is paramount
with the proliferation of opensource
tools
• Skilled data scientists who are
knowledgeable about algorithms and
their appropriate usage are key to
successful adoption
3. We are just getting started!
28. 28
Claim:
• Machine Learning models are
more accurate than
traditional models
Caution:
• Is accuracy the right metric?
• How do we evaluate the
model? RMS or R2
• How does the model behave
in different regimes?
4. Choose the right metrics for evaluation
29. 29
Claim:
• Machine Learning and AI will replace
humans in most applications
Caution:
• Beware of the hype!
• Just because it worked some times
doesn’t mean that the organization can
be on autopilot
• Will we have true AI or Augmented
Intelligence?
• Model risk and robust risk
management is paramount to the
success of the organization.
• We are just getting started!
5. Are we there yet?
https://www.bloomberg.com/news/articles/2017-10-20/automation-
starts-to-sweep-wall-street-with-tons-of-glitches
36. 36
• If computers can understand language, opens huge possibilities
▫ Read and summarize
▫ Translate
▫ Describe what’s happening
▫ Understand commands
▫ Answer questions
▫ Respond in plain language
Language allows understanding
37. 37
• Describe rules of grammar
• Describe meanings of words and their
relationships
• …including all the special cases
• ...and idioms
• ...and special cases for the idioms
• ...
• ...understand language!
Traditional language AI
https://en.wikipedia.org/wiki/Formal_language
38. 38
What is NLP ?
Jumping NLP Curves
https://ieeexplore.ieee.org/document/6786458/
40. 40
• Ambiguity:
▫ “ground”
▫ “jaguar”
▫ “The car hit the pole while it was moving”
▫ “One morning I shot an elephant in my pajamas. How he got into my
pajamas, I’ll never know.”
▫ “The tank is full of soldiers.”
“The tank is full of nitrogen.”
Language is hard to deal with
42. 42
• Many ways to say the same thing
▫ “the same thing can be said in many ways”
▫ “language is versatile”
▫ “The same words can be arranged in many different ways to express
the same idea”
▫ …
Language is hard to deal with
43. 43
• APIs
• Human Insight
• Expert Knowledge
• Build your own
Options?
46. QuSandbox- The platform for adopting Data
Science and AI in the Enterprise
2018 Copyright QuantUniversity LLC.
47. 47
• QuSandbox, is an end-to-end workflow based system to enable
creation and deployment of data science workflows within the
enterprise for primarily ML and AI applications.
• Our environment supports AWS and Google Cloud platform and
incorporates model and data provenance throughout the life cycle
of model development.
• The solution can also be hosted on-prem to leverage custom
hardware and software integrations.
Executive Summary
53. 54
• The regulatory sandbox allows businesses to test innovative
products, services, business models and delivery mechanisms in the
real market, with real consumers.
• The sandbox is a supervised space, open to both authorized and
unauthorized firms, that provides firms with:
▫ reduced time-to-market at potentially lower cost
▫ appropriate consumer protection safeguards built in to new products and
services
▫ better access to finance
• https://www.fca.org.uk/firms/regulatory-sandbox
Regulatory Sandboxes
54. 55
Quant/Enterprise use cases
• Create an environment that can support multiple platforms and
programming languages
• Enable remote running of applications
• Ability to try out a Github submission/ someone else’s code
• Facilitate creation of Docker images to create replicable containers
• Create prototyping environments for Data Science/Quant teams
• Enable Data scientists/Quants to deploy their solutions
• Enable running multiple tasks and jobs
• Enable concurrent running of multiple experiments
• Integrate seamlessly with the cloud to scale up computations
Use cases
55. 56
Fintech use cases
• To demonstrate solutions to enterprises
• Create customized enterprise trials for companies that don’t permit
installation of vendor software prior to procurement
• To manage quick updates
• Enable effective integration and hosting of services (REST APIs)
• To deploy custom services on QuSandbox
Use cases
56. 57
Academic use cases
• Enable creation of course material and exercises that could be
shared
• Enable students and workshop participants to focus on the data
science experiments rather than environment setting
Use cases
66. 67
Creating replicable environments
Create replicable environments (Code + software + data) through a easy point & click tool and
publish to Dockerhub or manage internally
Share it with target users
67. 68
User portal
• Run multiple experiments in pre-created environments (Code + software + data)
• Deploy your own solutions
• Run any Docker image or Github submission on the cloud
77. Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
www.analyticscertificate.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
78