SlideShare una empresa de Scribd logo
1 de 27
SQL + NOSQL + NEWSQL + REALTIME
FOR INVESTMENT BANKS
CHARLES CAI
ASHWANI ROY
8 March 2013
Enterprise Data Problems in Investment Banks
“BigData” History and Trend – Driven by Google
CAP Theorem for Distributed Computer System
Open Source Building Blocks: Hadoop, Solr, Storm..
Hypothetical Solution using Lambda Architecture
Where “BigData” Industry is Going?
3548
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/finance-sql-nosql-newsql
Presented at QCon London
www.qconlondon.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presenter: Charles Cai
¨  Charles Cai makes a living by designing and
implementing trading and risk systems for investment
banks.
¨  Currently a Chief Front Office Technical Architect in
a global energy trading firm.
¤  Twitter: @caidong
¤  Linkedin: charlescai
Presenter: Ashwani Roy
¨  Ashwani Roy – Masters in Finance Student at London
Business School and VP at a Tier 1 Investment Bank.
¨  Love to mix programming and Applied Mathematics
to solve difficult problems in Investment Banking
¤  Twitter: @Ashwani_Roy
¤  Linkedin: ashwaniroy
3548
Why Finance Industry should care ?
¨  We care because of
¤  Compliance requirements
¤  Risk Management
¤  Pricing
¤  Rise of Machines (Ecommerce)
¤  Cost Cutting
¤  BTW: Twitter is also part of Market Data
Sample Interest Model / Simulations
A quick Monte Carlo Demo
¨  Demo – Computing this is functional
Some Terminology
¨  PV = present value = Cash flows discounted to
current time
¨  Delta = change in price / change in interest rate
¨  Gamma .. Vega .. Rho .. Theta .. Vanna …. And
other Greeks
Monte Carlo Simulations -Results
¨  <results> = func<I,j,k……
¨  Parallelize computation with mappers
¨  Save results and run reducers
¨  [[ trade: 1 curveid: Orig PV:100 Delta:200]{ to OLAP}
..[ trade: 1 curveid: Sim1 PV:100 Delta:200] {big data}
..[ trade: 1 curveid: Sim2 PV:99 Delta:220]{big data} ] ]
Compliance
¤  Dodd-Frank requires >= five years records
¤  Fast Disaster recovery requirements (Tapes backup not acceptable)
¤  All Bloomberg and other chats to be saves in quick reportable form
¤  … Many more in Basel 3 and Dodd Frank Act
You need to
#  get chats for AshwaniRoy@bloomberg.net and ashwaniR@reuters.net
#  from the 5 years Bloomberg and Reuters log of a global investment bank
of 1TB(assume 1MB/Day/Trader * 220 trading days * 1000 traders* 5
years)
#  for all EURUSD swaps only
….. Additional filters and aggregation requirements
Big Data Industry History: Google’s Papers
1
Google’s Big Data Papers: 2003 – 2006
1
GFS – Google File
System
•  2003
•  Distributed file
system
•  3 x copies
•  Commodity
machines
•  Colossus (2012)
MapReduce
•  2004
•  Input à Map à
Partition à
Compare à
Shuffle à Sort à
Reduce à Output
BigTable
•  2006
•  Distributed Key-
Value column-
family based
database
Hadoop Distributed File System (HDFS)
¨  http://ecomcanada.files.wordpress.com/2012/11/hadoop-architecture.png
Google’s MapReduce Programming Model
1
Apache Hbase: Column Family Distributed K-V Store
Google’s Big Data Papers 2: 2010 - now
1
Percolator
• 2010
• Incremental update/
compute
•  built on BigTable
• Adds transactions, locks,
notifications
• SPFs: “Stream Processing
Frameworks” + underlying
database
Dremel
• 2010
• Online analytics and
visualization
• SQL like language for
structured data
• Each row is JSON object –
in protobuf format
• Column based
• Spanner (2012),
BigQuery, F1
Pregel
• 2010
• Scalable graph computing
• Worker threads à nodes
à parallel “superstep” à
messages à nodes à
Aggregator/Combiners
(global statistics)
• PageRank, shortest path,
bipartite matching
Impala
Tez/Stinger
Microsoft
Trinity
Unstructured Data: Index/Search Engine
¨  Github Code Search: 17 TB
Apache Lucene/SOLR
¨  Open Source Indexing
and Search Engine
¨  4,000+ Enterprise users
¤  IBM, HP, Cisco
¤  Apple, Linkedin
¤  Wikipedia
¤  CNet, Sky
¤  Twitter
What’s Next for Hadoop? Real-time!
Nathan Marz
Some more use cases
¤  Save money to save your jobs
¤  Save money to your firm can do more
¤  E Commerce is norm…
¤  Market sentiment analysis cannot be relied on using
“Bloomberg's sentiment analysis” only
¤  .. Add some more
“Lambda Architecture” – Nathan Marz, BackType/Twitter
¨  query = func (data, ...)
2
•  Real-time ticks, events…
•  Historical (all history data
points)
•  Curated/cleansed curves…
•  Derived curves…
•  Back-testing models…
•  …
•  Technical analysis…
•  Alerts…
•  Join across data sources (e.g.
correlation among weather / energy)
•  Curating/cleanse curves…
•  Derive curves, building models…
•  Back-testing models…
•  Visualization of the above!
•  …
•  Excel / VBA
•  Java, C#/F#...
•  MatLab
•  3rd party ETL Tools
•  R
•  …
Batch	
  Layer	
  (Hadoop) Servicing	
  Layer
Speed	
  Layer	
  (Storm)
RDBMS/DW	
  + 	
  Full-­‐text	
  Search	
  + 	
  Graph	
  Database
QFD	
  1
Batch	
  recompute
All	
  data	
  
(HDFS/HBase)
QFD	
  1
Tableau/Spotfire
Excel/Apps
MDX/DW
(T+ 1)
New	
  data	
  stream
Process	
  stream
Precompute	
  Views	
  
(MapReduce)
Realtime	
  increment	
  
QFD	
  2
QFD	
  2
QFD	
  N
Batch	
  views	
  (HDFS/Impala)
QFD	
  N
Realtime	
  views	
  (Apache	
  Hbase)
Metadata	
  /	
  Classification	
  /	
  Curation
Automation	
  /	
  Aggregation	
  /	
  Centralization
RDBMS Graph	
  Database
Merge
Full-­‐text	
  Search
COTS	
  
Reporting	
  Tools
Ad-­‐hock	
  Analysis/Writeback:	
  Java/
C#,R/Clojure,	
  HIVE/PIG,	
  Talend/3rd
	
  
party,	
  ...
Alerts
Visualization
Quality	
  /	
  Access	
  /	
  
Manipulation
Access	
  /	
  Centralization	
  /	
  
Manipulation
Acquisition
Visualization
Lambda	
  Architecture :	
  query	
  = 	
  func	
  (data,	
  ...)
Online resources and alternative stacks
¨  An Introduction to Data Science.PDF – Free e-book on Data Science with R under Creative
Commons Licenses
¨  Berkeley Data Analytics Stack (Open Source: Mesos – cluster management, Spark/Streaming
– cluster computing, Shark-SQL/DW)
¨  Learning Statistics with R, Free Big Data Education: Advanced Data Science
¨  DataStax Enterprise (Apache C*/Cassandra, Apache Hadoop, Apache Solr…)
¨  An example “lambda architecture” for real-time analysis of hashtags using Trident, Hadoop
and Splout SQL
¨  Nathan Marz (BackType, acquired by Twitter) Big Data Lambda Architecture
¨  Open source clustered Lucene: elasticsearch used by GitHub (17 TB code)
Distributed Computing System: CAP Theorem
https://github.com/thinkaurelius/titan/wiki/Storage-Backend-Overview
http://en.wikipedia.org/wiki/CAP_theorem
http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
Consistency
•  all nodes see the same data at
the same time
Availability
•  a guarantee that every request
receives a response about
whether it was successful or
failed
Partition tolerance
•  the system continues to operate
despite arbitrary message loss
or failure of part of the system
“Lambda Architecture”: Enterprise Data
•  Quality of data
•  Ways to improve
data quality
•  Discover hidden
business insights
•  Data sources
•  Data formats (./
semi-/non-
structured…)
•  Speed of change
•  Speed of reaction
•  Data size
•  Retention granular
level…
Volume Velocity
ValueVariety
“Lambda Architecture” – Nathan Marz, BackType/Twitter
¨  Design Principle:
¤  Human fault-tolerance
¤  Immutability
¤  Pre-computation
¨  Lambda Architecture:
¤  Batch Layer
¤  Serving Layer
¤  Speed Layer
¨  Technology Stack
¤  Apache Hadoop/HBase/Cloudera Impala
¤  Twitter Storm

Más contenido relacionado

Más de C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Más de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Integrating SQL & NoSQL & NewSQL & Realtime Data Intelligence for the Financial Industry

  • 1. SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS CHARLES CAI ASHWANI ROY 8 March 2013 Enterprise Data Problems in Investment Banks “BigData” History and Trend – Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. Hypothetical Solution using Lambda Architecture Where “BigData” Industry is Going? 3548
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /finance-sql-nosql-newsql
  • 3. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Presenter: Charles Cai ¨  Charles Cai makes a living by designing and implementing trading and risk systems for investment banks. ¨  Currently a Chief Front Office Technical Architect in a global energy trading firm. ¤  Twitter: @caidong ¤  Linkedin: charlescai
  • 5. Presenter: Ashwani Roy ¨  Ashwani Roy – Masters in Finance Student at London Business School and VP at a Tier 1 Investment Bank. ¨  Love to mix programming and Applied Mathematics to solve difficult problems in Investment Banking ¤  Twitter: @Ashwani_Roy ¤  Linkedin: ashwaniroy
  • 7. Why Finance Industry should care ? ¨  We care because of ¤  Compliance requirements ¤  Risk Management ¤  Pricing ¤  Rise of Machines (Ecommerce) ¤  Cost Cutting ¤  BTW: Twitter is also part of Market Data
  • 8. Sample Interest Model / Simulations
  • 9. A quick Monte Carlo Demo ¨  Demo – Computing this is functional Some Terminology ¨  PV = present value = Cash flows discounted to current time ¨  Delta = change in price / change in interest rate ¨  Gamma .. Vega .. Rho .. Theta .. Vanna …. And other Greeks
  • 10. Monte Carlo Simulations -Results ¨  <results> = func<I,j,k…… ¨  Parallelize computation with mappers ¨  Save results and run reducers ¨  [[ trade: 1 curveid: Orig PV:100 Delta:200]{ to OLAP} ..[ trade: 1 curveid: Sim1 PV:100 Delta:200] {big data} ..[ trade: 1 curveid: Sim2 PV:99 Delta:220]{big data} ] ]
  • 11. Compliance ¤  Dodd-Frank requires >= five years records ¤  Fast Disaster recovery requirements (Tapes backup not acceptable) ¤  All Bloomberg and other chats to be saves in quick reportable form ¤  … Many more in Basel 3 and Dodd Frank Act You need to #  get chats for AshwaniRoy@bloomberg.net and ashwaniR@reuters.net #  from the 5 years Bloomberg and Reuters log of a global investment bank of 1TB(assume 1MB/Day/Trader * 220 trading days * 1000 traders* 5 years) #  for all EURUSD swaps only ….. Additional filters and aggregation requirements
  • 12. Big Data Industry History: Google’s Papers 1
  • 13. Google’s Big Data Papers: 2003 – 2006 1 GFS – Google File System •  2003 •  Distributed file system •  3 x copies •  Commodity machines •  Colossus (2012) MapReduce •  2004 •  Input à Map à Partition à Compare à Shuffle à Sort à Reduce à Output BigTable •  2006 •  Distributed Key- Value column- family based database
  • 14. Hadoop Distributed File System (HDFS) ¨  http://ecomcanada.files.wordpress.com/2012/11/hadoop-architecture.png
  • 16. Apache Hbase: Column Family Distributed K-V Store
  • 17. Google’s Big Data Papers 2: 2010 - now 1 Percolator • 2010 • Incremental update/ compute •  built on BigTable • Adds transactions, locks, notifications • SPFs: “Stream Processing Frameworks” + underlying database Dremel • 2010 • Online analytics and visualization • SQL like language for structured data • Each row is JSON object – in protobuf format • Column based • Spanner (2012), BigQuery, F1 Pregel • 2010 • Scalable graph computing • Worker threads à nodes à parallel “superstep” à messages à nodes à Aggregator/Combiners (global statistics) • PageRank, shortest path, bipartite matching Impala Tez/Stinger Microsoft Trinity
  • 18. Unstructured Data: Index/Search Engine ¨  Github Code Search: 17 TB
  • 19. Apache Lucene/SOLR ¨  Open Source Indexing and Search Engine ¨  4,000+ Enterprise users ¤  IBM, HP, Cisco ¤  Apple, Linkedin ¤  Wikipedia ¤  CNet, Sky ¤  Twitter
  • 20. What’s Next for Hadoop? Real-time! Nathan Marz
  • 21. Some more use cases ¤  Save money to save your jobs ¤  Save money to your firm can do more ¤  E Commerce is norm… ¤  Market sentiment analysis cannot be relied on using “Bloomberg's sentiment analysis” only ¤  .. Add some more
  • 22. “Lambda Architecture” – Nathan Marz, BackType/Twitter ¨  query = func (data, ...) 2 •  Real-time ticks, events… •  Historical (all history data points) •  Curated/cleansed curves… •  Derived curves… •  Back-testing models… •  … •  Technical analysis… •  Alerts… •  Join across data sources (e.g. correlation among weather / energy) •  Curating/cleanse curves… •  Derive curves, building models… •  Back-testing models… •  Visualization of the above! •  … •  Excel / VBA •  Java, C#/F#... •  MatLab •  3rd party ETL Tools •  R •  …
  • 23. Batch  Layer  (Hadoop) Servicing  Layer Speed  Layer  (Storm) RDBMS/DW  +  Full-­‐text  Search  +  Graph  Database QFD  1 Batch  recompute All  data   (HDFS/HBase) QFD  1 Tableau/Spotfire Excel/Apps MDX/DW (T+ 1) New  data  stream Process  stream Precompute  Views   (MapReduce) Realtime  increment   QFD  2 QFD  2 QFD  N Batch  views  (HDFS/Impala) QFD  N Realtime  views  (Apache  Hbase) Metadata  /  Classification  /  Curation Automation  /  Aggregation  /  Centralization RDBMS Graph  Database Merge Full-­‐text  Search COTS   Reporting  Tools Ad-­‐hock  Analysis/Writeback:  Java/ C#,R/Clojure,  HIVE/PIG,  Talend/3rd   party,  ... Alerts Visualization Quality  /  Access  /   Manipulation Access  /  Centralization  /   Manipulation Acquisition Visualization Lambda  Architecture :  query  =  func  (data,  ...)
  • 24. Online resources and alternative stacks ¨  An Introduction to Data Science.PDF – Free e-book on Data Science with R under Creative Commons Licenses ¨  Berkeley Data Analytics Stack (Open Source: Mesos – cluster management, Spark/Streaming – cluster computing, Shark-SQL/DW) ¨  Learning Statistics with R, Free Big Data Education: Advanced Data Science ¨  DataStax Enterprise (Apache C*/Cassandra, Apache Hadoop, Apache Solr…) ¨  An example “lambda architecture” for real-time analysis of hashtags using Trident, Hadoop and Splout SQL ¨  Nathan Marz (BackType, acquired by Twitter) Big Data Lambda Architecture ¨  Open source clustered Lucene: elasticsearch used by GitHub (17 TB code)
  • 25. Distributed Computing System: CAP Theorem https://github.com/thinkaurelius/titan/wiki/Storage-Backend-Overview http://en.wikipedia.org/wiki/CAP_theorem http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed Consistency •  all nodes see the same data at the same time Availability •  a guarantee that every request receives a response about whether it was successful or failed Partition tolerance •  the system continues to operate despite arbitrary message loss or failure of part of the system
  • 26. “Lambda Architecture”: Enterprise Data •  Quality of data •  Ways to improve data quality •  Discover hidden business insights •  Data sources •  Data formats (./ semi-/non- structured…) •  Speed of change •  Speed of reaction •  Data size •  Retention granular level… Volume Velocity ValueVariety
  • 27. “Lambda Architecture” – Nathan Marz, BackType/Twitter ¨  Design Principle: ¤  Human fault-tolerance ¤  Immutability ¤  Pre-computation ¨  Lambda Architecture: ¤  Batch Layer ¤  Serving Layer ¤  Speed Layer ¨  Technology Stack ¤  Apache Hadoop/HBase/Cloudera Impala ¤  Twitter Storm