New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Risk Analysis in the Financial Services Industry
1. Revolution Confidential
R in the Financial Services
Industry
June 6, 2013
Karl-Kuno Kunze
Neil Miller
Andrie De Vries
Breakfast Briefing
2. Revolution Confidential
R in Financial Services
Welcome & Revolution
Neil Miller
Managing Director, International
Andrie de Vries
Business Services Director, Europe
Revolution Analytics
R in Financial Institutions
Karl-Kuno Kunze
Managing Director
Nagler & Company
2
3. Revolution Confidential
Revolution Analytics
Corporate Overview & Quick Facts
Founded 2007
Office Locations Palo Alto (HQ),
Seattle
(Engineering)
Singapore
London
CEO David Rich
Number of
customers
200+
Investors • Northbridge Venture Partners
• Intel Capital
• Platform Vendor
Web site: • www.revolutionanalytics.com
Revolution – “Contender”
The Forrester Wave™: Big Data
Predictive Analytics Solutions, Q1
2013
3
In the big data analytics
context, speed and scale
are critical drivers of
success, and Revolution
R delivers on both
Revolution R Enterprise is the leading commercial analytics platform based on
the open source R statistical computing language
7. Revolution Confidential
7
Revolution R Enterprise
ScaleR
Distributed High Performance Architecture +
High Performance Big Data Analytics
packages
RevoR
Performance Enhanced Open Source R + Open Source R
packages
g p
ConnectR
High Speed Connectors
PlatformR
Distributed Compute Contexts
DevelopR
Integrated Development
Environment
DeployR
Web Services
Revolution R Enterprise
High Performance, Multi-Platform Enterprise Analytics
Platform
9. Revolution Confidential
Integration Layer:
DeployR makes R accessible
Seamless
Bring the power of R to any web enabled application
Simple
Leverage common APIs including JS, Java, .NET
Scalable
Robustly scale user and compute workloads
Secure
Manage enterprise security with LDAP & SSO
9
R / Statistical
Modeling Expert
DeployR
Data AnalysisData Analysis
Business IntelligenceBusiness Intelligence
Mobile Web AppsMobile Web Apps
Cloud / SaaSCloud / SaaS
Deployment
Expert
10. Revolution Confidential
On-Demand Analytics with DeployR
10
Market Basket Analysis using Java Script and R enabled by DeployR
•User selection drives Java
Script…
•which drives R script…
•which drives Java Script to
return to user data and graphics
needed…
•…enabled by DeployR API’s
11. Revolution Confidential
Example: Allstate performance assessment of
SAS, R, Hadoop, Revolution (October 2012)
11
• Steve Yun, Principal Predictive Modeller at Allstate Research and Planning
Centre benchmarked SAS, R and Hadoop. “Data is our competitive
advantage”.
• Generalised Linear Model for 150 million observations of insurance data and 70
degrees of freedom.
Conclusion:
• SAS works, but is slow.
• The data is too big for open-source R, even on a very large server.
• Hadoop is not a right fit
• Revolution ScaleR gets the same results as SAS, but much faster and on cheaper kit
Software Platform Comments Time to fit
SAS (current
tool)
16-core Sun Server Proc GENMOD 5 hours
rmr / map-
reduce
10-node (8 cores /
node) Hadoop
cluster
Lot of coding, prep and error
investigation. Possible to
improve time?
> 9 hours
processing
Open source R 250-GB Server Full data set and sampling.
Sampling quicker but not
acceptable to business.
Impossible
(> 3 days)
Revolution
ScaleR
5-node (4 cores /
node) LSF cluster
90 minutes to load full data set 5.7
minutes
12. Revolution Confidential
“As things become more and more extreme, I
need a model that can estimate my risk in
a way to that enhances our confidence in
our pricing and reserving. Modeling with
Revolution R Enterprise gives me that.”
VP and Pricing Actuary, Jamie Botelho
Economic Capital Modeling
12
1 day to 15 minutes
100,000 years of simulations
Pricing optimization increases
financial health
Profile: 10-year-old reinsurer’s Actuarial Group
systematically makes sound financial and
pricing decisions in production system and
completes ad hoc analysis.
Key Technology: Revolution R Enterprise
replaced Excel; drives business rules in
company production system
Outcomes: Ability to compensate for lack of
historical data by simulating a wide variety
and quantity of events and using advanced
correlation techniques. Complete full day of
work in 15 minutes
Bottom line: Improved financial health by
managing risk and increasing pricing
optimization
13. Revolution Confidential
F100 Investment Co. Outlier & Error Detection
13
Profile: Full-service global investment and
securities management firm proved
effectiveness of Revolution R Enterprise to
detect potentially costly outliers and errors
Key Technology: Revolution R Enterprise
using ScaleR Big Data Analytics capabilities
Analytic Approach – Exchange Rate Error
Detection: ARIMA and VAR models used to
define acceptable value changes using the
prediction for the next value in a time series.
Models trained using historical data.
“The models’ performance were impressive
and few errors were missed.” VP, IT
Bottom line: new analytics paradigm for
existing processes introduced, with potential
for millions of dollars in cost avoidance
>65M end-of-day trades
>8,500 variables
Weekly model re-training
Analytic Approach – Outlier Detection: Use
historical data for each customer (>65M end-
of-day trades and >8,500 variables) to build and
train linear regression model to establish range
of predicted values for customers’ trades so that
actual trades can be analyzed for outliers.
“Using statistical analysis by customer delivers
superior accuracy compared to rules-based
analysis (such as analyzing largest 10% of
trades), which fail over time as volumes or client
behavior changes. Statistical models that can
be retrained (e.g weekly) will account for
changes and not fail over time.” VP, IT
14. Revolution Confidential
Quantitative Research @ Global Investment Co.
14
Profile: Full-service global investment and
securities management firm’s IT team proved
effectiveness of Revolution R Enterprise to
detect potentially costly outliers and errors
Key Technology: Revolution R Enterprise
using ScaleR Big Data and DeployR integrated
with Siteminder, which provides a secure,
transparent, centralized analytics center.
Analytic Approach – develop models that
can be applied to real-world data to exploit
market opportunities and successfully develop,
back-test, and deploy quantitative and event-
based trading and investment strategies to
effectively manage risk.
Quants’ daily model updates
deployed to 100’s of traders
Challenge - Quantitative Research Group
had a decentralized modeling practice where
quants used Excel, Python, Java, open
source R, and other tools to develop models
that informed daily trading. This environment
posed risk to IP protection, model versioning,
transparency.
Bottom Line - Powerful statistical analytics
platform provides centralized, secure model
repository guides hundreds of millions of
dollars of transactions made by 100’s of
traders.
15. Revolution Confidential
Innovates to Outperform
15
“One of the first R-based production
deployments we rolled out tracks revenue
flows among manufacturers and their
suppliers. We combine public and
proprietary data and apply graph analyses to
get a clearer understanding of the likely
performance of suppliers. These forecasts
are more accurate than what could be
developed with quarters-old public financial
reports.”
- Sr. Quantitative Researcher, Tal Sasani
Profile: Publicly-traded, investment
management company that includes the
Livestrong family of funds. Revolution R
Enterprise optimizes $8.5B portfolio of 22
funds.
Key Technology: Revolution R Enterprise
replacing proprietary industry applications.
Tableau front end for production analytics.
Outcomes: Battery of custom analytics now
run overnight to inform morning work
Put R-based analytics into production
Bottom Line: Custom-built simulations,
scenario analyses & financial stress tests
improve confidence in forecasts and analysis,
lifting the business
New data, more lift
Strategy simulation & portfolio
optimization
Days to overnight
16. Revolution Confidential
Other Financial Services examples
Op Risk: Conducting Monte Carlo simulations on 100,000 years of simulated data to measure
aggregate operational risk from 7 types of operational risk in accordance with BASEL II
requirements
Mortgage loan default analysis and prediction in a Hadoop environment
Moved from SAS = lower cost, better model uplift, better Hadoop integration
Credit Scoring in Database with Netezza: Increased Speed
Model Governance Issues: Model management through DeployR – changing analyst
community and business user access via Qlikview, Excel, Python
Using Revolution to support SAS to analyse foreign trade transactions to identify anomalies:
Better data exploration and visualisation
Control – “1600 SAS programmers and all the new guys coming in know R – now is the time to
get my hands around R before it spins out of control with all these new R zealots coming on
board”
IT Innovation – starting to use Hadoop. SAS too hard to write map reduce jobs
Cross Platform – 500 Teradata appliances and 10 Netezza. Seamlessly deploy analysis across
their infrastructure
16
17. Revolution ConfidentialHigh Performance R & Big Data Analytics
Parallel External Memory Algorithms
17
Data import – Delimited,
Fixed, SAS, SPSS, OBDC
Variable creation &
transformation
Recode variables
Factor variables
Missing value handling
Sort
Merge
Split
Aggregate by category
(means, sums)
Data import – Delimited,
Fixed, SAS, SPSS, OBDC
Variable creation &
transformation
Recode variables
Factor variables
Missing value handling
Sort
Merge
Split
Aggregate by category
(means, sums)
Min / Max
Mean
Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product
matrix for set variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data
(standard tables & long form)
Marginal Summaries of Cross
Tabulations
Min / Max
Mean
Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product
matrix for set variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data
(standard tables & long form)
Marginal Summaries of Cross
Tabulations
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
Data Prep, Distillation & Descriptive AnalyticsData Prep, Distillation & Descriptive Analytics
Subsample (observations &
variables)
Random Sampling
Subsample (observations &
variables)
Random Sampling
R Data Step Statistical Tests
Sampling
Descriptive Statistics
18. Revolution ConfidentialHigh Performance R & Big Data Analytics
Parallel External Memory Algorithms
18
Sum of Squares (cross product
matrix for set variables)
Multiple Linear Regression
Generalized Linear Models (GLM)
- All exponential family
distributions: binomial, Gaussian,
inverse Gaussian, Poisson,
Tweedie. Standard link functions
including: cauchit, identity, log,
logit, probit. User defined
distributions & link functions.
Covariance & Correlation
Matrices
Logistic Regression
Classification & Regression Trees
Predictions/scoring for models
Residuals for all models
Sum of Squares (cross product
matrix for set variables)
Multiple Linear Regression
Generalized Linear Models (GLM)
- All exponential family
distributions: binomial, Gaussian,
inverse Gaussian, Poisson,
Tweedie. Standard link functions
including: cauchit, identity, log,
logit, probit. User defined
distributions & link functions.
Covariance & Correlation
Matrices
Logistic Regression
Classification & Regression Trees
Predictions/scoring for models
Residuals for all models
Histogram
Line Plot
Scatter Plot
Lorenz Curve
ROC Curves (actual data and
predicted values)
Histogram
Line Plot
Scatter Plot
Lorenz Curve
ROC Curves (actual data and
predicted values)
K-Means K-Means
Statistical ModelingStatistical Modeling
Decision Trees Decision Trees
Predictive Models Cluster AnalysisData Visualization
Classification
Machine LearningMachine Learning
SimulationSimulation
Monte Carlo Monte Carlo