SlideShare una empresa de Scribd logo
1 de 40
1© Cloudera, Inc. All rights reserved.
Estimating Financial Risk with
Spark
Sandy Ryza | Senior Data Scientist
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/spark-financial-modeling
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
2© Cloudera, Inc. All rights reserved.
3© Cloudera, Inc. All rights reserved.
4© Cloudera, Inc. All rights reserved.
In reasonable
circumstances, what’s
the most you can expect
to lose?
5© Cloudera, Inc. All rights reserved.
6© Cloudera, Inc. All rights reserved.
def valueAtRisk(
portfolio,
timePeriod,
pValue
): Double = { ... }
7© Cloudera, Inc. All rights reserved.
def valueAtRisk(
portfolio,
2 weeks,
0.05
) = $1,000,000
8© Cloudera, Inc. All rights reserved.
Probability
density
Portfolio return ($) over the time
period
9© Cloudera, Inc. All rights reserved.
VaR estimation approaches
• Variance-covariance
• Historical
• Monte Carlo
10© Cloudera, Inc. All rights reserved.
11© Cloudera, Inc. All rights reserved.
12© Cloudera, Inc. All rights reserved.
Market Risk Factors
• Indexes (S&P 500, NASDAQ)
• Prices of commodities
• Currency exchange rates
• Treasury bonds
13© Cloudera, Inc. All rights reserved.
Predicting Instrument Returns from Factor Returns
• Train a linear model on the factors for each instrument
14© Cloudera, Inc. All rights reserved.
Fancier
• Add features that are non-linear transformations of the market risk factors
• Decision trees
• For options, use Black-Scholes
15© Cloudera, Inc. All rights reserved.
import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression
// Load the instruments and factors
val factorReturns: Array[Array[Double]] = ...
val instrumentReturns: RDD[Array[Double]] = ...
// Fit a model to each instrument
val models: Array[Array[Double]] =
instrumentReturns.map { instrument =>
val regression = new OLSMultipleLinearRegression()
regression.newSampleData(instrument, factorReturns)
regression.estimateRegressionParameters()
}.collect()
16© Cloudera, Inc. All rights reserved.
How to sample factor returns?
• Need to be able to generate sample vectors where each component is a factor
return.
• Factors returns are usually correlated.
17© Cloudera, Inc. All rights reserved.
Distribution of US treasury bond two-week returns
18© Cloudera, Inc. All rights reserved.
Distribution of crude oil two-week returns
19© Cloudera, Inc. All rights reserved.
The Multivariate Normal Distribution
• Probability distribution over vectors of length N
• Given all the variables but one, that variable is distributed according to a univariate
normal distribution
• Models correlations between variables
20© Cloudera, Inc. All rights reserved.
21© Cloudera, Inc. All rights reserved.
import org.apache.commons.math3.stat.correlation.Covariance
// Compute means
val factorMeans: Array[Double] = transpose(factorReturns)
.map(factor => factor.sum / factor.size)
// Compute covariances
val factorCovs: Array[Array[Double]] = new Covariance(factorReturns)
.getCovarianceMatrix().getData()
22© Cloudera, Inc. All rights reserved.
Fancier
• Multivariate normal often a poor choice compared to more sophisticated options
• Fatter tails: Multivariate T Distribution
• Filtered historical simulation
• ARMA
• GARCH
23© Cloudera, Inc. All rights reserved.
Running the simulations
• Create an RDD of seeds
• Use each seed to generate a set of simulations
• Aggregate results
24© Cloudera, Inc. All rights reserved.
// Broadcast the factor return -> instrument return models
val bModels = sc.broadcast(models)
// Generate a seed for each task
val seeds = (baseSeed until baseSeed + parallelism)
val seedRdd = sc.parallelize(seeds, parallelism)
// Create an RDD of trials
val trialReturns: RDD[Double] = seedRdd.flatMap { seed =>
trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs)
}
25© Cloudera, Inc. All rights reserved.
def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = {
val trialFactorReturns = factorDist.sample()
var totalReturn = 0.0
for (model <- models) {
// Add the returns from the instrument to the total trial return
for (i <- until trialFactorsReturns.length) {
totalReturn += trialFactorReturns(i) * model(i)
}
}
totalReturn
}
26© Cloudera, Inc. All rights reserved.
Executor
Executor
Time
Model
parameters
Model
parameters
27© Cloudera, Inc. All rights reserved.
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Executor
Task Task
Trial Trial TrialTrial
Task Task
Trial Trial Trial Trial
Time
Model
parameters
Model
parameters
28© Cloudera, Inc. All rights reserved.
// Cache for reuse
trialReturns.cache()
val numTrialReturns = trialReturns.count().toInt
// Compute value at risk
val valueAtRisk = trials.takeOrdered(numTrialReturns / 20).last
// Compute expected shortfall
val expectedShortfall =
trials.takeOrdered(numTrialReturns / 20).sum / (numTrialReturns / 20)
29© Cloudera, Inc. All rights reserved.
30© Cloudera, Inc. All rights reserved.
VaR
31© Cloudera, Inc. All rights reserved.
So why Spark?
32© Cloudera, Inc. All rights reserved.
Ease of use
• Parallel computing for 5-year olds
• Scala, Python, and R REPLs
33© Cloudera, Inc. All rights reserved.
• Cleaning data
• Fitting models
• Running simulations
• Storing results
• Analyzing results
Single platform for
34© Cloudera, Inc. All rights reserved.
But it’s CPU-bound and we’re using Java?
• Computational bottlenecks are normally in matrix operations, which can be BLAS-
ified
• Can call out to GPUs just like in C++
• Memory access patterns aren’t high-GC inducing
35© Cloudera, Inc. All rights reserved.
Want to do this yourself?
36© Cloudera, Inc. All rights reserved.
spark-timeseries
• https://github.com/cloudera/spark-timeseries
• Everything here + some fancier stuff
• Patches welcome!
37© Cloudera, Inc. All rights reserved.
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations/
spark-financial-modeling

Más contenido relacionado

Destacado

Estimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache SparkEstimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache SparkCloudera, Inc.
 
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Domino Data Lab
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock Hakka Labs
 
The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo methodSaurabh Sood
 
Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations  Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations Michael Wallace
 
Monte Carlo Simulations
Monte Carlo SimulationsMonte Carlo Simulations
Monte Carlo Simulationsgfbreaux
 

Destacado (7)

Estimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache SparkEstimating Financial Risk with Apache Spark
Estimating Financial Risk with Apache Spark
 
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock
 
The monte carlo method
The monte carlo methodThe monte carlo method
The monte carlo method
 
Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations  Improving Forecasts with Monte Carlo Simulations
Improving Forecasts with Monte Carlo Simulations
 
Monte Carlo Simulations
Monte Carlo SimulationsMonte Carlo Simulations
Monte Carlo Simulations
 
How to perform a Monte Carlo simulation
How to perform a Monte Carlo simulation How to perform a Monte Carlo simulation
How to perform a Monte Carlo simulation
 

Similar a Financial Modeling with Apache Spark: Calculating Value at Risk

MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMayank Prasad
 
Microservices Chaos Testing at Jet
Microservices Chaos Testing at JetMicroservices Chaos Testing at Jet
Microservices Chaos Testing at JetC4Media
 
[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric CloudPerforce
 
MySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMayank Prasad
 
Patterns & Practices for Cloud-based Microservices
Patterns & Practices for Cloud-based MicroservicesPatterns & Practices for Cloud-based Microservices
Patterns & Practices for Cloud-based MicroservicesC4Media
 
Load Testing: See a Bigger Picture
Load Testing: See a Bigger PictureLoad Testing: See a Bigger Picture
Load Testing: See a Bigger PictureAlexander Podelko
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Fwdays
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
 
Create Agile, Automated and Predictable IT Infrastructure in the Cloud
Create Agile, Automated and Predictable IT Infrastructure in the CloudCreate Agile, Automated and Predictable IT Infrastructure in the Cloud
Create Agile, Automated and Predictable IT Infrastructure in the CloudRightScale
 
DevOps Roadshow - cloud services for development
DevOps Roadshow - cloud services for developmentDevOps Roadshow - cloud services for development
DevOps Roadshow - cloud services for developmentMicrosoft Developer Norway
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Henning Jacobs
 
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Altair
 
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdfyishengxi
 
Deep Learning - Continuous Operations
Deep Learning - Continuous Operations Deep Learning - Continuous Operations
Deep Learning - Continuous Operations Haggai Philip Zagury
 

Similar a Financial Modeling with Apache Spark: Calculating Value at Risk (20)

MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
 
Microservices Chaos Testing at Jet
Microservices Chaos Testing at JetMicroservices Chaos Testing at Jet
Microservices Chaos Testing at Jet
 
[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud
 
MySQL Performance Schema : fossasia
MySQL Performance Schema : fossasiaMySQL Performance Schema : fossasia
MySQL Performance Schema : fossasia
 
Patterns & Practices for Cloud-based Microservices
Patterns & Practices for Cloud-based MicroservicesPatterns & Practices for Cloud-based Microservices
Patterns & Practices for Cloud-based Microservices
 
Academic and industry research collaboration – the Mathworks suite
Academic and industry research collaboration – the Mathworks suiteAcademic and industry research collaboration – the Mathworks suite
Academic and industry research collaboration – the Mathworks suite
 
Puppet on a string
Puppet on a stringPuppet on a string
Puppet on a string
 
Load Testing: See a Bigger Picture
Load Testing: See a Bigger PictureLoad Testing: See a Bigger Picture
Load Testing: See a Bigger Picture
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
Create Agile, Automated and Predictable IT Infrastructure in the Cloud
Create Agile, Automated and Predictable IT Infrastructure in the CloudCreate Agile, Automated and Predictable IT Infrastructure in the Cloud
Create Agile, Automated and Predictable IT Infrastructure in the Cloud
 
DevOps Roadshow - cloud services for development
DevOps Roadshow - cloud services for developmentDevOps Roadshow - cloud services for development
DevOps Roadshow - cloud services for development
 
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
Why we don’t use the Term DevOps: the Journey to a Product Mindset - DevOpsCo...
 
Mechatronics engineer
Mechatronics engineerMechatronics engineer
Mechatronics engineer
 
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
Surrogate Model-Based Reliability Analysis of Composite UAV Wing facilitation...
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
22-4_PerformanceTuningUsingtheAdvisorFramework.pdf
 
Rich Assad Resume
Rich Assad ResumeRich Assad Resume
Rich Assad Resume
 
Deep Learning - Continuous Operations
Deep Learning - Continuous Operations Deep Learning - Continuous Operations
Deep Learning - Continuous Operations
 

Más de C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

Más de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Financial Modeling with Apache Spark: Calculating Value at Risk

  • 1. 1© Cloudera, Inc. All rights reserved. Estimating Financial Risk with Spark Sandy Ryza | Senior Data Scientist
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /spark-financial-modeling
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. 2© Cloudera, Inc. All rights reserved.
  • 5. 3© Cloudera, Inc. All rights reserved.
  • 6. 4© Cloudera, Inc. All rights reserved. In reasonable circumstances, what’s the most you can expect to lose?
  • 7. 5© Cloudera, Inc. All rights reserved.
  • 8. 6© Cloudera, Inc. All rights reserved. def valueAtRisk( portfolio, timePeriod, pValue ): Double = { ... }
  • 9. 7© Cloudera, Inc. All rights reserved. def valueAtRisk( portfolio, 2 weeks, 0.05 ) = $1,000,000
  • 10. 8© Cloudera, Inc. All rights reserved. Probability density Portfolio return ($) over the time period
  • 11. 9© Cloudera, Inc. All rights reserved. VaR estimation approaches • Variance-covariance • Historical • Monte Carlo
  • 12. 10© Cloudera, Inc. All rights reserved.
  • 13. 11© Cloudera, Inc. All rights reserved.
  • 14. 12© Cloudera, Inc. All rights reserved. Market Risk Factors • Indexes (S&P 500, NASDAQ) • Prices of commodities • Currency exchange rates • Treasury bonds
  • 15. 13© Cloudera, Inc. All rights reserved. Predicting Instrument Returns from Factor Returns • Train a linear model on the factors for each instrument
  • 16. 14© Cloudera, Inc. All rights reserved. Fancier • Add features that are non-linear transformations of the market risk factors • Decision trees • For options, use Black-Scholes
  • 17. 15© Cloudera, Inc. All rights reserved. import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression // Load the instruments and factors val factorReturns: Array[Array[Double]] = ... val instrumentReturns: RDD[Array[Double]] = ... // Fit a model to each instrument val models: Array[Array[Double]] = instrumentReturns.map { instrument => val regression = new OLSMultipleLinearRegression() regression.newSampleData(instrument, factorReturns) regression.estimateRegressionParameters() }.collect()
  • 18. 16© Cloudera, Inc. All rights reserved. How to sample factor returns? • Need to be able to generate sample vectors where each component is a factor return. • Factors returns are usually correlated.
  • 19. 17© Cloudera, Inc. All rights reserved. Distribution of US treasury bond two-week returns
  • 20. 18© Cloudera, Inc. All rights reserved. Distribution of crude oil two-week returns
  • 21. 19© Cloudera, Inc. All rights reserved. The Multivariate Normal Distribution • Probability distribution over vectors of length N • Given all the variables but one, that variable is distributed according to a univariate normal distribution • Models correlations between variables
  • 22. 20© Cloudera, Inc. All rights reserved.
  • 23. 21© Cloudera, Inc. All rights reserved. import org.apache.commons.math3.stat.correlation.Covariance // Compute means val factorMeans: Array[Double] = transpose(factorReturns) .map(factor => factor.sum / factor.size) // Compute covariances val factorCovs: Array[Array[Double]] = new Covariance(factorReturns) .getCovarianceMatrix().getData()
  • 24. 22© Cloudera, Inc. All rights reserved. Fancier • Multivariate normal often a poor choice compared to more sophisticated options • Fatter tails: Multivariate T Distribution • Filtered historical simulation • ARMA • GARCH
  • 25. 23© Cloudera, Inc. All rights reserved. Running the simulations • Create an RDD of seeds • Use each seed to generate a set of simulations • Aggregate results
  • 26. 24© Cloudera, Inc. All rights reserved. // Broadcast the factor return -> instrument return models val bModels = sc.broadcast(models) // Generate a seed for each task val seeds = (baseSeed until baseSeed + parallelism) val seedRdd = sc.parallelize(seeds, parallelism) // Create an RDD of trials val trialReturns: RDD[Double] = seedRdd.flatMap { seed => trialReturns(seed, trialsPerTask, bModels.value, factorMeans, factorCovs) }
  • 27. 25© Cloudera, Inc. All rights reserved. def trialReturn(factorDist: MultivariateNormalDistribution, models: Seq[Array[Double]]): Double = { val trialFactorReturns = factorDist.sample() var totalReturn = 0.0 for (model <- models) { // Add the returns from the instrument to the total trial return for (i <- until trialFactorsReturns.length) { totalReturn += trialFactorReturns(i) * model(i) } } totalReturn }
  • 28. 26© Cloudera, Inc. All rights reserved. Executor Executor Time Model parameters Model parameters
  • 29. 27© Cloudera, Inc. All rights reserved. Executor Task Task Trial Trial TrialTrial Task Task Trial Trial Trial Trial Executor Task Task Trial Trial TrialTrial Task Task Trial Trial Trial Trial Time Model parameters Model parameters
  • 30. 28© Cloudera, Inc. All rights reserved. // Cache for reuse trialReturns.cache() val numTrialReturns = trialReturns.count().toInt // Compute value at risk val valueAtRisk = trials.takeOrdered(numTrialReturns / 20).last // Compute expected shortfall val expectedShortfall = trials.takeOrdered(numTrialReturns / 20).sum / (numTrialReturns / 20)
  • 31. 29© Cloudera, Inc. All rights reserved.
  • 32. 30© Cloudera, Inc. All rights reserved. VaR
  • 33. 31© Cloudera, Inc. All rights reserved. So why Spark?
  • 34. 32© Cloudera, Inc. All rights reserved. Ease of use • Parallel computing for 5-year olds • Scala, Python, and R REPLs
  • 35. 33© Cloudera, Inc. All rights reserved. • Cleaning data • Fitting models • Running simulations • Storing results • Analyzing results Single platform for
  • 36. 34© Cloudera, Inc. All rights reserved. But it’s CPU-bound and we’re using Java? • Computational bottlenecks are normally in matrix operations, which can be BLAS- ified • Can call out to GPUs just like in C++ • Memory access patterns aren’t high-GC inducing
  • 37. 35© Cloudera, Inc. All rights reserved. Want to do this yourself?
  • 38. 36© Cloudera, Inc. All rights reserved. spark-timeseries • https://github.com/cloudera/spark-timeseries • Everything here + some fancier stuff • Patches welcome!
  • 39. 37© Cloudera, Inc. All rights reserved.
  • 40. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/ spark-financial-modeling