Watch full webinar here: https://bit.ly/2KkJ08B
Financial institutions need to implement new strategies and services that will drive them securely to their digital objectives over their entire infrastructure.
- How to securely move legacy systems and data to new technologies such as the Big Data and Cloud?
- How to break down silos and ensure a global, centralized, secure and agile access to meaningful data?
- How to facilitate data sharing while applying strict and coherent governance and security rules?
- How to avoid downtime and to guarantee the success of IT initiaves while optimizing costs and resources?
- How to produce and to maintain efficient reports and financial aggregations for the holdings and CxO managers?
We are pleased to invite you to this online session to discover how data virtualization can answer these questions and contribute to the digital transformation of financial institutions.
WHAT IS IT ABOUT?
This virtual event will be organized in two parts. First, we will conduct a conference focusing on the impact of digital transformation in the financial sector, in addition to the general concepts of Data Virtualization and how it has supported the new business goals of financial companies in terms of IT modernization, risk management, governance and security. Then, we will conduct will conduct a hands-on session with a guided live demo to help you discover the main features and benefits of Denodo Platform for Data Virtualization.
2. Agenda9:00am - ONLINE CONFERENCE
• Introductory keynote
• How banking and financial institutions can leverage on Denodo Data Virtualization?
• Success stories in the financial sector
10:00am – ONLINE HANDS-ON SESSION
• Use cases & Successful implementations
• Performance and optimization
• Governance and security
• Live demo with Denodo
• Next steps
• Q&A
11:00am - CONCLUSIONS & END OF SESSION
4. ❖ 26 Brands
❖ 172 M Turnover
❖ 600 Clients references
❖ 1575 Experts around the world
AAA BeNeLux
❖ 5 Brands
❖ 20 M Turnover
❖ 60 Clients references
❖ 180 Experts
AAA ecosystem federates various consulting
companies and expertise across the world
5. Dynafin / Satisco Data Competence Center history
A long lasting story of successful achievements
6. A changing World
The financial sector faces some major challenges,
impacting all domains and levels of companies:
STRATEGY COMMERCIAL
TECHNOLOGY
FINANCEREGULATION
ORGANISATION
TALENT
PROCESSES
DATA MOBILITY
DATA SECURITY
DATA MANAGEMENT
DATA MONITORING
Dynafin / Satisco Data Competence Center history
7. Banking IT Integration
services
Partner in Financial
services
❖ Impact analyses
❖ Data centric strategy
❖ Data modelling
❖ Data governance strategy
❖ Change management
❖ Project management
❖ Testing
❖ Architecture design
❖ Technical analyses
❖ Integration strategy review
❖ Integration (re)development
❖ IT Testing
❖ DevOps
❖ Data Virtualization strategy
❖ Data Virtualization modelling
❖ Solution implementation follow-up
❖ Data governance strategy follow-up
❖ Data security strategies
❖ Change Management
❖ Denodo expertise
❖ Denodo Product Evolution
❖ Denodo Product Support
❖ Denodo Product Training
❖ Denodo User Meetings
❖ Denodo Use Cases Sharing
Digital Influencer and
data enabler
Solution provider
A multidisciplinary team to guide you in your business transformations
DATA
CENTRIC
OFFER
CREATION
Dynafin / Satisco Data Competence Center history
8. HOW BANKING AND FINANCIAL INSTITUTIONS CAN
LEVERAGE ON DENODO DATA VIRTUALIZATION?
WEBINAR 29 OCTOBER
9. Speakers
Aly Wane Diene
Senior Solution
Consultant
Alain Kunnen
Chairman &
Associate
Vincent Boucheron
Data Influencer &
Managing Partner
10. HOW BANKING AND FINANCIAL INSTITUTIONS CAN
LEVERAGE ON DENODO DATA VIRTUALIZATION?
WEBINAR 29 OCTOBER
A BIT OF CONTEXT….
12. DATA AND PERSONAS OF EXISTING ECOSYSTEMS
Sales
HR
Apps/API
Executive
Marketing
Data Science
AI/ML
SIMPLIFIED PICTURE
90% OF DEMANDS REQUIRES
NEAR REAL TIME
DATA SECURITY AND GOVERNANCE?
75% OF STORED DATA
NEVER USED
16. DATA VIRTUALIZATION – HOW IT WORKS?
CONNECT, COMBINE & CONSUME
Sales
HR
Executive
Marketing Apps/API
Data Science
AI/ML
Connect
Combine
Consume
COMBINE & INTEGRATE INTO BUSINESS
DATA VIEWS
44. SUCCESS STORIES IN THE FINANCIAL SECTOR
WEBINAR 29 OCTOBER
TIME AND MEANS….
45. 45
DATA VIRTUALISATION : FACTS AND FIGURES
TIME AND MEANS
Denodo infrastructure
• Windows or linux
• On premise or Cloud base (AWS and Azure Available)
Licence & Infrastructure Costs
• On Premises : Depend on client infrastructure
• HW Cloud Base + Client Own License : Depend on client context
• Cloud base :from 88 ke to 188 ke according to context
Implementation time
• On Premise : 3 to 6 month Based on BNPP implementations
• Cloud based : Less than 6 month in average
47. TAKE AWAY
• We have all the technical tools needed to reach success
• Do not forget data governance and change
• DV federates data so try to co finance your projects with
other CxO
• Pick the right battles and data, dream big, deliver fast
• You are not alone DIAMS & Denodo can help you ….
54. 54
Case Study : Unified view into regulatory risk
Business Need
• Need a controlled data environment to support tougher
regulatory requirements.
• Information does not tie across all data silos.
• Need a smart data governance initiative to avoid garbage-
in-garbage-out problem.
• c
Benefits
• Enables faster time-to-market and incremental information
delivery.
• Helps CIT realize value from data - successfully access all
data through provisioning point instead of through legacy
point-to-point integration.
• Minimize data replication and proliferation by eliminating
data redundancy.
Solution
55. 55
Case Study : Logical Data Warehouse
Business Need
• Accelerate business operations in loans, deposits, and
other departments through use of self-service reports and
dashboards.
• Establish a central information delivery platform to easily
add new data sources from acquired companies.
• Prior cloud-based data warehouse was inflexible to
accommodate new data sources.
Benefits
• Improved efficiency through self-service for business users
in loans, deposits, fraud, credit, and risk departments.
• Reporting turnaround time improved from 2-3 days to 2
hours.
• Business operations such as loan processing is handled in
real-time.
Solution
57. 57
Performance and optimization
“What about performance?”…usually first question we get about
Data Virtualization
Many factors affect performance
• Data sources, network latency, complexity of query and
processing, consumer ingestion rates, etc.
Overall performance driver…
• Minimize data moved through network
• Maximize ‘local’ data processing
‘Move processing to the data’
1
1
58. 58
Query Optimization Pipeline
Parts of the optimization pipeline
Query
Parsing
• Retrieves execution capabilities and restrictions for views involved in the
query
Static
Optimizer
• Query delegation
• SQL rewriting rules (removal of redundant filters, tree pruning, join
reordering, transformation push-up, star-schema rewritings, etc.)
• Data movement query plans
Dynamic
Optimizer
• Picks optimal JOIN methods and orders based on data distribution statistics,
indexes, transfer rates, etc.
Execution
• Creates the calls to the underlying systems in their corresponding protocols
and dialects (SQL, MDX, WS calls, etc.)
59. 59
Performance Optimization Techniques
Query Plans
SQL is a declarative language:
• Queries specify what users want, not how to get the data.
• There are potentially many ways of executing a query.
A query plan specifies a set of steps for executing a query or subquery.
The optimizer has the goal of selecting the best plan.
• First generate multiple query plans for each query.
• Estimate the cost of each plan.
• Select the plan with the minimum cost.
60. 60
Performance Optimization Techniques
Static vs. Dynamic Optimization
Static optimization:
• Based on SQL transformations.
• Rewrite query in more optimal way.
• Remove redundancies, inactive sub-
trees, etc.
• Push-down delegation:
• Optimize query by pushing down sub-
trees to underlying data sources.
Dynamic optimization:
• Use statistics and indices to estimate costs of
alternative execution plans.
• Select Join methods and Join ordering.
61. 61
Performance Optimization Techniques
Query Delegation
Objective: Push the processing to the data.
• Utilize power and optimizations of underlying data sources.
• Especially relational databases and data warehouses.
• Minimize expensive data movement.
Delegation mechanisms:
• Vendor specific SQL dialect.
• Function delegation.
• Configurable by data source.
• Delegate SQL operations.
• e.g. Join, Union, Group By, Order By, etc.
62. 62
Performance Optimization Techniques
Query Rewriting
Goal: Rewrite query in an optimal way before the query is executed. Typical
optimizations:
• Simplify partitioned unions.
• Remove redundant sub-views.
• Remove unused join branches due to projections.
• Transform outer joins to inner joins.
• Static join reordering to maximize delegation.
• Reordering of operations.
• Full and partial aggregation push-down.
63. 63
Performance Optimization Techniques
Source Constraint Optimization
Denodo Platform optimization has to work across multiple diverse data source
types:
• Not just relational databases.
• Not all data sources have same capabilities.
Recognize and optimize for constraints in underlying data sources:
• e.g. MySQL can be ordered for Merge join… but a delimited file cannot.
64. 64
Performance Optimization Techniques
Data Movement
Typically used when one dataset is significantly smaller and aggregations performed
on joined data:
Execute query in
DS1 and fetch its
data
1 3 When step 2 is completed,
execute the JOIN in DS2 and
return the results to the DV layer
Create a temporary
table in DS2 and insert
data from step 1
2
DS1 DS2
65. 65
Caching
Real time vs. caching
Sometimes, real time access & federation not a good fit:
• Sources are slow (e.g.. text files, cloud apps. like Salesforce.com)
• A lot of data processing needed (e.g.. complex combinations, transformations, matching, cleansing, etc.)
• Limited access or have to mitigate impact on the sources
For these scenarios, Denodo can replicate just the relevant data in the cache
66. 66
Caching
Overview
Based on an external relational database
• Traditional: Oracle, SLQServer, DB2, MySQL
• MPP: Teradata, Netezza, Vertica
• Cloud-based: Amazon Redshift, Snowflake
• In-memory storage: Oracle TimesTen, SAP HANA
Works at view level
• Allows hybrid access (real-time / cached) of an execution tree
Cache Control
• Manually – user initiated at any time
• Time based - using the TTL or the Denodo Scheduler
• Event based - e.g. using JMS messages triggered in the DB
67. 67
Caching
Caching options
Denodo offers two different types of cache
• Partial:
• Query-by-query cache
• Useful for caching only the most commonly requested data
• More adequate to represent the capabilities of non-relational sources, like web services or APIs
with input parameters
• Full:
• Similar to the concept of materialized view
• Incrementally updateable at row level to avoid unnecessary full refresh loads
• Offers full push-down capabilities to the source, including group by and join operations
• Supports hybrid incremental queries for SaaS data sources (next slide)
68. 68
Caching
Incremental Queries
Merge cached data and fresh data to provide fully up-to-date results with minimum
latency
Leads changed /
added since 1:00AM
CACHE
Leads updated at
1:00AM
Up-to-date Leads
data
1. Salesforce ‘Leads’ data cached in Denodo at
1:00 AM.
2. Query needing Leads data arrives at 11:00 AM
3. Only new/changed leads are retrieved through
the WAN.
4. Response is up-to-date but query is much
faster.
69. 69
MPP Query Acceleration
• Denodo 7.0 supports using MPP cluster to accelerate queries
• Hive, Spark, Impala, Presto
• Operations that can be parallelized can be moved to MPP cluster
• e.g. GROUP BY aggregations
• Data is copied to cluster and operation is delegated for processing
• Data copied in Parquet file
• Results returned to Denodo Platform
• Does not require any special commands from user
70. 70
Example Scenario
Sales by state over the last four years.
Scenario:
• Current data (last 12 months) in EDW
• Historical data offloaded to Hadoop
cluster for cheaper storage
• Customer master data is in the RDBMS
Very large data volumes:
• Sales tables have hundreds of millions of
rows
join
group by State
union
Current Sales
68 million rows
Historical Sales
220 million rows
Customer
2 million rows (RDBMS)
71. 71
MPP Query Acceleration
Current Sales
68 M rows
Customer
2 M rows
join
group by State
System Execution Time Optimization Techniques
Others ~ 19 min Simple federation
No MPP 43 sec Aggregation push-down
With MPP 26 sec
Aggregation push-down + MPP integration (Impala 4
nodes)
5. Fast parallel execution
Support for Spark, Presto and Impala
for fast analytical processing in
inexpensive Hadoop-based solutions
Hist. Sales
220 M rows
4. Integration with local data
The engine detects when data
is cached or comes from a
local table already in the MPP
2. Integrated with Cost Based Optimizer
Based on data volume estimation and
the cost of these particular operations,
the CBO can decide to move all or part
of the execution tree to the MPP
group by ID
2M rows
(sales by customer)
1. Partial Aggregation
push down
Maximizes source processing
dramatically Reduces network
traffic
3. On-demand data transfer
Denodo automatically generates
and upload Parquet files
74. 74
Governance
• Governance is a very broad topic
• More than just data access and delivery
• Data Virtualization can play an important role in overall data governance
• But it’s not the whole story by itself
• End-to-end governance, metadata management, data lineage, etc. require other
tools
• Data Virtualization has “1 degree of visibility”
75. 75
Enterprise Governance
Data Lineage
• Find source of ‘truth’ – top down – shows where data comes from and/or how it is derived.
Source Refresh
• Detect changes in underlying data sources and propagate to the affected data services.
Impact Analysis
• Analyze impact of metadata changes in workflows where the modified view is used.
Catalog Search
• Have a complete understanding of each of the views and data services created in Denodo.
76. 76
Metadata Management
Data Virtualization Platform collects lots of metadata
• Data source metadata
• By introspection or configuration
• Operational metadata
• How data changes as it flows through Data
Virtualization layer
• Generated by the Data Virtualization Platform based
on ‘model’ built by developers
• Business metadata
• Enriched metadata either imported or added by
‘data steward’
• e.g. view and field descriptions
Metadata
Categories
Technical
Metadata
Operational
Metadata
Business
Metadata
77. 77
Metadata Introspection
• Denodo Platform gathers metadata from data sources:
• Automatically or via configuration.
• Maps native data types to ‘Denodo types’.
• Inspects indexes in the sources.
• Analyzes source query capabilities and abstracts them into common model.
• Stores all metadata and configuration data in Metadata repository:
• Uses built-in Apache Derby database.
• Small size – only stores metadata…actual data is retrieved in real time from sources or cache.
78. 78
Data Lineage
• Graphical view for showing data lineage
for any field in any virtual view.
• Trace source of any field:
• Includes any functions
applied to field
contents.
• Trace source of calculated fields:
• View calculations used
to create new fields.
79. 79
‘Used By’ Tool
• Graphical view for showing
where a view is used.
• “Big picture” view of usage.
• Useful tool for seeing impact of
changes on whole system.
80. 80
Impact Analysis example:
Adding a new field
1
Views affected
by the change
2
Web Services
affected by the
change
3
Option to propagate
new field individually
per view
4
Preview of the
Tree view of the
affected views
81. 81
Authentication
Register Denodo Asset Type
Publish Assets & Flows
Metadata Integration
• Export all metadata – Technical,
Operational, Business
• APIs and Stored Procedures
• Integration with Governance Tools
• IBM IGC, Collibra, Informatica Enterprise
Information Catalog (EIC)
• Data Virtualization Platform
represents the “as implemented”
data asset
Denodo Governance
Bridge
Information Server
IGC
REST
Information
Governance
Catalog
82. 82
Data Quality & Integrity
• Data Virtualization can help with data quality
• Apply data quality functions as data is requested
• e.g. address lookup and validation routines
• But…serious data cleansing – e.g. matching and
deduping – is not recommended
• Use a DQ tool or MDM
• Data Virtualization forces you to think about the
best source of accurate data
• ‘Customer’ view – which are the best sources for
customer data?
• Manual process to decide and build views
83. 83
Data Access
• Making data available to the users who need it
• Based on need, not access to databases or applications
• Managing and auditing data access
• Ensure (and prove) compliance with security policies
• Regulatory, geograp hic, contractual, and organizational
data access compliance
87. 87
Security
Unified Security Management through Data Virtualization.
• Data Virtualization offers an abstraction layer that decouples sources from
consumer applications.
• Single Point for accessing all the information avoiding point-to-point connections to
sources.
• As a single point of access, this is an ideal place to enforce security:
• Access restrictions to sources are enforced here.
• They can be defined in terms of the canonical model (e.g. access restrictions to “Bill”, to
“Order”, and so on) with a fine granularity.
89. 89
Secure Access
Data Virtualization secures the access from consumers to sources:
• Consumer to Denodo Platform (northbound):
• Communications between consumer applications and the Data Virtualization layer can be secured,
typically using of SSL (data in motion).
• Denodo Platform to Sources (southbound):
• Communications between the Data Virtualization layer and the sources can be secured too.
• Specific security protocol depends on the source: SSL, HTTPS, sFTP, … (data in motion).
• Data can be both read and exported encrypted (data at rest).
90. 90
Denodo Platform Authentication – Northbound
• Client application -> Denodo Platform.
• Three options:
• Usernames and passwords defined within the Denodo Platform.
• Delegate the authentication to an external LDAP/AD server.
• Use Kerberos for Single Sign On.
91. 91
Denodo Platform Authentication – Southbound
• Denodo Platform -> data source.
• Three options (for each individual source):
• Use a service account for the source.
• The admins create a user account in the source.
• The Denodo Platform always uses those credentials.
• Use Kerberos authentication.
• Use credentials pass-through.
• Access the data source with the username and password combination or the Kerberos ticket that was used to authenticate
with the Denodo Platform northbound.
92. 92
Denodo Platform Authorization
• Role-based Authorization.
• Users/roles can be defined in the Data Virtualization layer and assigned specific permissions.
• Fine-grained authorization.
• Several permissions scopes:
• Virtual Database level (e.g. credit risk database, etc.).
• Views level (e.g. “Regional Risk Exposure”, etc.).
• Row level (filter rows that are not authorized)
• Column level:
▪ Grant/block access.
▪ Data masking (hiding sensitive fields).
93. 93
Role-Base Data Privacy
• Control what data is visible based on user role
• e.g. Admin sees everything, Analyst has PII masked
• Masking can be encryption, tokenization, partial masking, redaction
• Built-in and custom functions allow partial masking, tokenization, etc.
• More complex logic also possible
• e.g. HIPAA Safe Harbor zip code handling using in-memory look-up maps
• e.g. anonymization of CC owners and transactions for pattern analysis
95. 95
Policy Based Security
• Custom Policies allow developers to
provide their own access control rules.
• Developers can code their own custom
access control policies and the
administrator can assign them to one (or
several) users/roles in a view in Denodo
(or to a whole database). DATA SOURCES
Custom
Policies
POLICY SERVER
(e.g Axiomatics)
Accept
+ Filter
+ Mask
Reject
Condition
s Satisfied
DATA CONSUMERS USERS, APPS
96. 96
Policy Based Security : Example
Dynamic Authorization based on
policies
• Example : set limits to the number of
queries executed by a certain user/role;
determine if a query can be executed
depending on the time of the day or
leveraging the access policies in an
external policy server.