SlideShare una empresa de Scribd logo
1 de 23
1© Cloudera, Inc. All rights reserved.
How Apache Spark and Apache
Hadoop is helping to keep the Banking
regulators happy
2© Cloudera, Inc. All rights reserved.
Agenda
• Existing Architecture for Analytics & Risk
• Ever-changing Regulatory Landscape
• Challenges with existing architectures
• Modern architecture for Financial Risk
• Demo of key capabilities
3© Cloudera, Inc. All rights reserved.
Typical Existing Analytical Architecture
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
4© Cloudera, Inc. All rights reserved.
Regulatory Landscape
2012 2013 2014 2015 2016 2017 2018 2019
ICB Ring-fencing
ICB Loss
Absorbency
Leverage
Ratio -
Basel III
NSFR – Basel
III
MiFID II
T2S
LCR -
Basel III
ICB / Competition
Audit Policy
Cross Border
Debt Recovery
Financial
Transaction Tax
Market Abuse
Directive (MAD
II)
PRIP
Accounting
Directive
Review
AIFM Directive
EU Transparency
Directive
EU Reg on
Credit Rating
Agencies
CRDV
Internal
Governance
GuidelinesFATCA
PD
EMIR
SWAPS Push Out
– Dodd Frank
Securities Law
Directive (SLD)
Volker Rule –
Dodd Frank
Short Selling
Close Out
Netting
Crisis
Management
Recovery &
Resolution
Effective dates yet to be confirmed
BCBS 239 FRTB
5© Cloudera, Inc. All rights reserved.
Existing Architectures under pressure
Limited Data – Incorporating new risk factors
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!
Limited Data & Insight
• Adding new data source
• Risk Factors
!
Latent Value
• How long to get new
reports with new risk factors
6© Cloudera, Inc. All rights reserved.
Existing Architectures under pressure
Missed SLA’s for VaR, ES & Stress scenarios
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!
Overloaded Bottlenecks
* Ever-increasing ETL
windows
!
Overloaded Bottlenecks
* Ever-increasing batch
windows to extract data
7© Cloudera, Inc. All rights reserved.
Existing Architectures under pressure
Frustrated Quants on the “edge” nodes (not-only-sql)
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!
Lack of Tooling
* Ad-hoc, on-demand
complex risk modeling
requirements
8© Cloudera, Inc. All rights reserved.
http://www.bis.org/publ/bcbs239.pdf
9© Cloudera, Inc. All rights reserved.
III - Accuracy &
Integrity
Strive for a single
authoritative source for
risk data. Aggregate on
an automated basis.
IV - Completeness
Capture and aggregate
all material risk data.
Data available by
business line, legal entity,
asset type, industry,
region.…
V - Timeliness
Generate aggregate
and up-to-date risk
data in a timely
manner.
VI - Adaptability
Meet a broad range of
on-demand, ad-hoc
risk management
reporting requests.
BCBS-239: Principles for Risk Data Aggregation
• Data, models and
processes live in silos
• Hard to get enterprise
wide view of risk
• Difficult to aggregate
• Lack of enterprise data
taxonomy
• Failed audits
• Aggregate / reported
risk data is infrequent
and stale
• Unable to handle
crisis situations
• Complex risk
modeling process
• Unable to handle
crisis situations
10© Cloudera, Inc. All rights reserved.
A modern risk platform calls for…
Scalability
More risk measures, more
scenarios. Fine-grained risk
data result in an order of
magnitude increase in
volume.
Speed
More frequent stress testing
and regulatory reporting.
High velocity scenario
development and
deployment.
Agility
More frequent stress testing
and Support for variety of
languages. Pre-trade
decisions. “What-if”
scenarios.
Transparency
Verifiable data. Timely
response to audits. Data
quality and lineage. Data and
model governance.
11© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
HDFS
High-throughput, scalable,
fault-tolerant, distributed
file system.
MapReduce
Distributed parallel
processing
frameworks.
12© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Apache Impala
Massively Parallel
Processing (MPP) SQL
engine.
Apache Spark
In-memory distributed
processing framework.
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
13© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Apache Spark
Distributed compute
framework. Can support
Python / C++, as well as
Java and Scala.
Data Science Workbench
Fully integrated data science
notebook application.
Cloudera Data
Science Workbench
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
14© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Cloudera Data
Science Workbench
Apache Kudu
Real-time streaming
architectures for true
Aggregated Risk of
Demand
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
15© Cloudera, Inc. All rights reserved.
Modern Platform for Analytics and Machine Learning
Data
Sources
EDW
Analytic
Database
Operational
Database
Data Science
& Engineering
Shared Data
Layer
Modern Data Platform
Fixed
Reports
Dashboards/
Analytic
Applications
Non-SQL
Workloads
Self-
Service
BI/Ad Hoc
Flexible
Reporting
MiFID II, FRTB, IFRS-9, BCBS-239, MAD/MAR, GDPR, ….
16© Cloudera, Inc. All rights reserved.
BCBS 239 / FRTB “Illustrative” Architecture
Market Data Revaluation Calculation & Aggregation Reporting
Market Data Feeds
IPV
Independent Price
Valuation Function
MRF / NMRF
Modelable & Non-
Modelable Risk Factors
Calibration
Fixed Income
Front Office
Pricing Engines
Equity Mkts
Front Office
Pricing Engines
FX
Front Office
Pricing Engines
… Other Mkts
Front Office
Pricing Engines
Enterprise Data Hub
Static Data Market Data Configuration
P&L Vectors Sensitivities Events
Positions & Transaction Data
Scenarios
- Current
- Historic
- Stressed
- Projected
Risk
Metrics SA-related Risk
Components
Counter-Party
Credit Risk XVA
ES & Stressed ES P&L Attribution VaR
Regulatory
Applications
MiFID 2 Stress Testing GDPR
FRTB SA FRTB IMA EMIR
Regulatory
Reporting
Management
Reporting
Scenarios
RiskSensitivities
17© Cloudera, Inc. All rights reserved.
BCBS 239 – Timeliness (Real-time risk)
Simplifying Lambda architectures with Apache Kudu
Kafka Spark
Streaming
Kudu
Spark MLlib
Application
Data
Sources
Individual Session
Full Model/Learning
Genesis
Real-time
Risk with
Greeks
1
Event
Occurs
2
Market
Data 3
Stream
Processin
g
4
Land in
RDBMS
5
Batch
Valuation
18© Cloudera, Inc. All rights reserved.
Metadata
Management
Ingest
Validation
Profiling
Developer Tools: IDEs, Notebooks, SCM Operations Tools: Scheduling, Workflow, Publishing
Data Management Exploration / Model Development Production / Model Deployment
Feature
Engineering
Model Training
& Testing
Visualization
Production
Feature
Generation
Production
Model Port
Production
Testing
Result
Validation
Serving
User: Data Engineer User: Quant Analyst Users: Data / Dev / Ops Engineer
Modern Platform for Analytics and Machine Learning
Supporting complete development lifecycle for risk
19© Cloudera, Inc. All rights reserved.
Risk Footprint with
Apache Spark and Hadoop
o 19 GSIB customers
o 9 banks with risk use
cases in production
o 6000+ nodes deployed
o >5 years in production
20© Cloudera, Inc. All rights reserved.
Market Risk
aggregation platform
for a Global
Systemically
Important Bank
55x faster processing, 8x more data
capacity
300+ daily interactive users analyzing
current and historical data
21© Cloudera, Inc. All rights reserved.
Global Systemically
Important Bank
On-premise and cloud-
based Hadoop clusters
according to workload.
Tested on AWS to 40,000
cores. Demonstrated
linear scaling of simulation
workloads.
22© Cloudera, Inc. All rights reserved.
Demo
23© Cloudera, Inc. All rights reserved.
Q&A

Más contenido relacionado

La actualidad más candente

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Kai Wähner
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Serviceconfluent
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...confluent
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 

La actualidad más candente (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
kafka
kafkakafka
kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 

Similar a How Apache Spark and Apache Hadoop are being used to keep banking regulators happy

Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Precisely
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Matt Stubbs
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Finance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupFinance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupEric Detterman
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicAmazon Web Services
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertconfluent
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Servicesconfluent
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Niel Dunnage
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bankChungsik Yun
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Addressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge ManagementAddressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge ManagementDataWorks Summit
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 
Kafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKai Wähner
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for IndustriesAvadhoot Patwardhan
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 

Similar a How Apache Spark and Apache Hadoop are being used to keep banking regulators happy (20)

Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
APM
APMAPM
APM
 
Finance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupFinance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan Meetup
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo Logic
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Services
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bank
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Addressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge ManagementAddressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge Management
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Kafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance Industry
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for Industries
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Último (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

How Apache Spark and Apache Hadoop are being used to keep banking regulators happy

  • 1. 1© Cloudera, Inc. All rights reserved. How Apache Spark and Apache Hadoop is helping to keep the Banking regulators happy
  • 2. 2© Cloudera, Inc. All rights reserved. Agenda • Existing Architecture for Analytics & Risk • Ever-changing Regulatory Landscape • Challenges with existing architectures • Modern architecture for Financial Risk • Demo of key capabilities
  • 3. 3© Cloudera, Inc. All rights reserved. Typical Existing Analytical Architecture Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc
  • 4. 4© Cloudera, Inc. All rights reserved. Regulatory Landscape 2012 2013 2014 2015 2016 2017 2018 2019 ICB Ring-fencing ICB Loss Absorbency Leverage Ratio - Basel III NSFR – Basel III MiFID II T2S LCR - Basel III ICB / Competition Audit Policy Cross Border Debt Recovery Financial Transaction Tax Market Abuse Directive (MAD II) PRIP Accounting Directive Review AIFM Directive EU Transparency Directive EU Reg on Credit Rating Agencies CRDV Internal Governance GuidelinesFATCA PD EMIR SWAPS Push Out – Dodd Frank Securities Law Directive (SLD) Volker Rule – Dodd Frank Short Selling Close Out Netting Crisis Management Recovery & Resolution Effective dates yet to be confirmed BCBS 239 FRTB
  • 5. 5© Cloudera, Inc. All rights reserved. Existing Architectures under pressure Limited Data – Incorporating new risk factors Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc ! Limited Data & Insight • Adding new data source • Risk Factors ! Latent Value • How long to get new reports with new risk factors
  • 6. 6© Cloudera, Inc. All rights reserved. Existing Architectures under pressure Missed SLA’s for VaR, ES & Stress scenarios Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc ! Overloaded Bottlenecks * Ever-increasing ETL windows ! Overloaded Bottlenecks * Ever-increasing batch windows to extract data
  • 7. 7© Cloudera, Inc. All rights reserved. Existing Architectures under pressure Frustrated Quants on the “edge” nodes (not-only-sql) Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc ! Lack of Tooling * Ad-hoc, on-demand complex risk modeling requirements
  • 8. 8© Cloudera, Inc. All rights reserved. http://www.bis.org/publ/bcbs239.pdf
  • 9. 9© Cloudera, Inc. All rights reserved. III - Accuracy & Integrity Strive for a single authoritative source for risk data. Aggregate on an automated basis. IV - Completeness Capture and aggregate all material risk data. Data available by business line, legal entity, asset type, industry, region.… V - Timeliness Generate aggregate and up-to-date risk data in a timely manner. VI - Adaptability Meet a broad range of on-demand, ad-hoc risk management reporting requests. BCBS-239: Principles for Risk Data Aggregation • Data, models and processes live in silos • Hard to get enterprise wide view of risk • Difficult to aggregate • Lack of enterprise data taxonomy • Failed audits • Aggregate / reported risk data is infrequent and stale • Unable to handle crisis situations • Complex risk modeling process • Unable to handle crisis situations
  • 10. 10© Cloudera, Inc. All rights reserved. A modern risk platform calls for… Scalability More risk measures, more scenarios. Fine-grained risk data result in an order of magnitude increase in volume. Speed More frequent stress testing and regulatory reporting. High velocity scenario development and deployment. Agility More frequent stress testing and Support for variety of languages. Pre-trade decisions. “What-if” scenarios. Transparency Verifiable data. Timely response to audits. Data quality and lineage. Data and model governance.
  • 11. 11© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop HDFS High-throughput, scalable, fault-tolerant, distributed file system. MapReduce Distributed parallel processing frameworks.
  • 12. 12© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Apache Impala Massively Parallel Processing (MPP) SQL engine. Apache Spark In-memory distributed processing framework. Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop
  • 13. 13© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Apache Spark Distributed compute framework. Can support Python / C++, as well as Java and Scala. Data Science Workbench Fully integrated data science notebook application. Cloudera Data Science Workbench Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop
  • 14. 14© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Cloudera Data Science Workbench Apache Kudu Real-time streaming architectures for true Aggregated Risk of Demand Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop
  • 15. 15© Cloudera, Inc. All rights reserved. Modern Platform for Analytics and Machine Learning Data Sources EDW Analytic Database Operational Database Data Science & Engineering Shared Data Layer Modern Data Platform Fixed Reports Dashboards/ Analytic Applications Non-SQL Workloads Self- Service BI/Ad Hoc Flexible Reporting MiFID II, FRTB, IFRS-9, BCBS-239, MAD/MAR, GDPR, ….
  • 16. 16© Cloudera, Inc. All rights reserved. BCBS 239 / FRTB “Illustrative” Architecture Market Data Revaluation Calculation & Aggregation Reporting Market Data Feeds IPV Independent Price Valuation Function MRF / NMRF Modelable & Non- Modelable Risk Factors Calibration Fixed Income Front Office Pricing Engines Equity Mkts Front Office Pricing Engines FX Front Office Pricing Engines … Other Mkts Front Office Pricing Engines Enterprise Data Hub Static Data Market Data Configuration P&L Vectors Sensitivities Events Positions & Transaction Data Scenarios - Current - Historic - Stressed - Projected Risk Metrics SA-related Risk Components Counter-Party Credit Risk XVA ES & Stressed ES P&L Attribution VaR Regulatory Applications MiFID 2 Stress Testing GDPR FRTB SA FRTB IMA EMIR Regulatory Reporting Management Reporting Scenarios RiskSensitivities
  • 17. 17© Cloudera, Inc. All rights reserved. BCBS 239 – Timeliness (Real-time risk) Simplifying Lambda architectures with Apache Kudu Kafka Spark Streaming Kudu Spark MLlib Application Data Sources Individual Session Full Model/Learning Genesis Real-time Risk with Greeks 1 Event Occurs 2 Market Data 3 Stream Processin g 4 Land in RDBMS 5 Batch Valuation
  • 18. 18© Cloudera, Inc. All rights reserved. Metadata Management Ingest Validation Profiling Developer Tools: IDEs, Notebooks, SCM Operations Tools: Scheduling, Workflow, Publishing Data Management Exploration / Model Development Production / Model Deployment Feature Engineering Model Training & Testing Visualization Production Feature Generation Production Model Port Production Testing Result Validation Serving User: Data Engineer User: Quant Analyst Users: Data / Dev / Ops Engineer Modern Platform for Analytics and Machine Learning Supporting complete development lifecycle for risk
  • 19. 19© Cloudera, Inc. All rights reserved. Risk Footprint with Apache Spark and Hadoop o 19 GSIB customers o 9 banks with risk use cases in production o 6000+ nodes deployed o >5 years in production
  • 20. 20© Cloudera, Inc. All rights reserved. Market Risk aggregation platform for a Global Systemically Important Bank 55x faster processing, 8x more data capacity 300+ daily interactive users analyzing current and historical data
  • 21. 21© Cloudera, Inc. All rights reserved. Global Systemically Important Bank On-premise and cloud- based Hadoop clusters according to workload. Tested on AWS to 40,000 cores. Demonstrated linear scaling of simulation workloads.
  • 22. 22© Cloudera, Inc. All rights reserved. Demo
  • 23. 23© Cloudera, Inc. All rights reserved. Q&A