SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
FREE TO SHARE
Sheetal Pratik - Saxo Bank
August 2020
Data Workbench
About Saxo
We are leading fintech and regtech
specialists, connecting traders, investors and
partners to more than 35,000 instruments –
across all asset classes – from a single
account.
What we do
We build digital platforms to facilitate multi-
asset market access and provide clients of all
sizes with professional-grade tools, industry-
leading prices and best-in-class service.
Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
• A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the
fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the
“butterfly effect” in the downstream systems.
• The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how
data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have
impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the
data dictionary centrally or duplicate the effort of creating and maintaining such data assets.
• Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any
of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance
implementation.
• The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of
crawler and using ML to extract the metadata from various data sources.
DATA GOVERNANCE FOR A DIGITAL NATIVE
3
For DomainTeams
Who need visibility onthe availability,meaning, usage,ownershipand quality of data
The Data Workbench(Owner's pride Neighbor's envy)
Is a one-stopdata shop
That provides transparency of Saxo’s data ecosystem
Unlike our current state whichis becomingincreasingly complex as we grow
Our product will help Saxo to improve time to market andunlock new insights.
The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data
QualitySolution.
1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data
Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus
helping withthe adoptionof the tool.
2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is
a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality
of their data.
VISION
4
• Federated Data Governance model is an industry trend where the enterprise governance team facilitates the
monitoring and management of the quality of enterprise critical data, with assistance from the business unit.
• LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to
DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more
decentralized architecture that puts domains before anything else to support the possibility of self-service data
platform.
• We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data
domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable
and more relevant to stakeholders.
• We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to
evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast
forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank.
• The LinkedIn datahub team has been extremely responsive.
• Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have
built their specific solutions.
MOTIVATION FOR THE SOLUTION
5
PERSONAS: GOALS AND PAIN POINTSGoalPainPoints
Data Asset
Owner
Data
Steward
Data Governance
Committee Member
Data Scientist Data Consumer
(Reporting)
To find data, its owner of
data and anything else
that helps compliance
To solve business
problems based on
data
To get an overview
of whole data in
the org
To define and
document data
standards
To be responsible
and accountable
for data
Lack of clarity on
ownership of data
New role in Saxo
Bank, so yet to
own full
responsibility
Lack of clarity on
what to mandate
at what level
(federated or data
domain level)
Data missing/
incomplete
See who can
explain data
elements
Not sure if the
data can be
trusted for making
the right decisions
6
DATA MESH APPROACH - PRODUCT THINKING
7
DIRECT AGENT/
FLEET ENGAGEMENT
SPEED OF
DELIVERY
Domain
Polyglot
Output Data
Ports
Polyglot Data
Input Ports
Control
Ports
Logs,
metrics
Self-serve
description
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
7
TARGET METADATA MODEL
8
STREAM PROCESSING
DATA PLATFORM - HIGH LEVEL OVERVIEW
9
ConsumptionDomain
Data Source
Business
Capabilities
Business
Capabilities
Reports
Trading
Instruments
Prices
Customers/Parti
es
Product
RX
Data Workbench
BUSINESS EVENTS
STREAM PROCESSING
(Enrich/Transform/Aggregate)
DATA STORAGE& MANAGEMENT
DW
DL Storage
DATA PRODUCTS
Native Data Products Domain Data
Products
Aggregated Data
Products
Fit for Purpose Data
Products
Confluent Kafka Platform
Data Catalog:
DATAHUB
DATA QUALITY
Great
Expectations:
OTHER PROCESSING FRAMEWORKS
● SAXO features list
● SAXO initial evaluation params
● TW extended feature list
● Product documentation
● Software: Local installations
● Vendor questionnaire
● Includes initial & shortlisted list
● Data Catalog
● Data Quality
Evaluation Criteria Definition Shortlisted tools Evaluation Process
TOOLS EVALUATION PROCESS
10
11
DATA CATALOG TOOL EVALUATION
11
Prioritized Feature List
● Full Text search on dataset
name, attributes and tags
● Extensiblesearch model
Metadata Search
● Web-based UI to show
metadata, governance
attributes, tags and lineage
● Ability to edit and enrich
attributes
Metadata UI
● Push-based REST API
● Pull-based adapters for
Snowflake and CRM dynamics
● Extensibility
Metadata Ingestion
● Metadata entity for datasets,
its users and attributes
● Business glossary &
documentation
● Extensibility
Metadata Modelling
● Dataset lineagewith upstream
and downstream provenance
● Integration with data
processing/orchestration tools
Data Lineage
● Support for metadata
enrichment and tagging
● Ability to flaga dataset
Data Stewardship
● Shows related Quality Attributes in
UI
● Extensibleto integrate with any DQ
tool
Data Quality Integration
● Cloud-native(Scalable& High
Availability)
● Configurable
● Extensible
Architecture
● Data as a Product
● Distributed Domain Driven
Architecture
● Self-serviceplatform
Alignment with Data Mesh
● Authentication / LDAP
● Authorization / RBAC
Security ● Metadata Versioning
● Data Virtualization
● ML/AI capabilities
Deprioritized
● Export API
Metadata Export
● LicensingCost
● Customization / Development cost
Total Cost of Ownership
● Release cycle
● Community support
● Commercial Support
● Documentation
Support
11
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
12
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
13
DATA CATALOG TOOL EVALUATION
Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context.
Datahub Marquez Amundsen Collibra Zeenea
Metadata Search *
Metadata UI Editable
Metadata Ingestion *
Metadata Modelling *
Data Lineage *
Metadata Export *
Data Stewardship
Data Quality Integration
Architecture * Only supports
AWS
Security *
Alignment with Data Mesh *
Support
Total Cost of Ownership *
Completelysuitable
Partiallysuitable
No/Minimal suitability
Commercial
Open Source
• Push Based approach that supports Event Driven
architecture. The solutionbuilt onthe principle of
self-service andproducers know theirdata better
and they canprovide the rich metadata so that it
helps in discover the data-assets andencourages
consumption.
• DetailedEvaluationwascarriedout fromSaxo
perspective.
• Possibility of evolution
• Extensibility with the open sourceand evolve
the toolas per needs. Right fit from feature
perspectivein terms of Data Governance
maturity of the organization.
• Leverage from larger community needs and
also influence internal process when needed.
• Reputation of LinkedIn, their success in Kafka
and datahub scaling internally to LinkedIn
volume
• Promising Roadmap 14
**Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
DATASET ONBOARDING - RESPONSIBILITIES
Metadata
● Dataset/Data Product Metadata
○ Ownership Information
○ Reader Information
○ Topic Configuration Details
○ Dataset Structure (AVRO Schema)
○ Business Term mapping
○ Source Dataset Definition (Optional)
● Quality Rules
Data Engineering
● Domain Transformations
● (Kafka Stream)
PRODUCERS
Metadata
● Consumer Details
● Usage Details
● Target Dataset details
CONSUMERS
Engineering Capabilities
● Supporting New Domains
● Metadata Integration
● DQ Integration
Kafka PLATFORM-Lean Team
15
Thank You
Q&A?
16

Más contenido relacionado

La actualidad más candente

Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Anastasija Nikiforova
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
Databricks
 

La actualidad más candente (20)

Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
Data catalog
Data catalogData catalog
Data catalog
 

Similar a LinkedInSaxoBankDataWorkbench

WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong
 

Similar a LinkedInSaxoBankDataWorkbench (20)

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 

Último

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Último (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

LinkedInSaxoBankDataWorkbench

  • 1. FREE TO SHARE Sheetal Pratik - Saxo Bank August 2020 Data Workbench
  • 2. About Saxo We are leading fintech and regtech specialists, connecting traders, investors and partners to more than 35,000 instruments – across all asset classes – from a single account. What we do We build digital platforms to facilitate multi- asset market access and provide clients of all sizes with professional-grade tools, industry- leading prices and best-in-class service. Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
  • 3. • A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the “butterfly effect” in the downstream systems. • The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the data dictionary centrally or duplicate the effort of creating and maintaining such data assets. • Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance implementation. • The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of crawler and using ML to extract the metadata from various data sources. DATA GOVERNANCE FOR A DIGITAL NATIVE 3
  • 4. For DomainTeams Who need visibility onthe availability,meaning, usage,ownershipand quality of data The Data Workbench(Owner's pride Neighbor's envy) Is a one-stopdata shop That provides transparency of Saxo’s data ecosystem Unlike our current state whichis becomingincreasingly complex as we grow Our product will help Saxo to improve time to market andunlock new insights. The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data QualitySolution. 1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus helping withthe adoptionof the tool. 2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality of their data. VISION 4
  • 5. • Federated Data Governance model is an industry trend where the enterprise governance team facilitates the monitoring and management of the quality of enterprise critical data, with assistance from the business unit. • LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more decentralized architecture that puts domains before anything else to support the possibility of self-service data platform. • We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable and more relevant to stakeholders. • We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank. • The LinkedIn datahub team has been extremely responsive. • Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have built their specific solutions. MOTIVATION FOR THE SOLUTION 5
  • 6. PERSONAS: GOALS AND PAIN POINTSGoalPainPoints Data Asset Owner Data Steward Data Governance Committee Member Data Scientist Data Consumer (Reporting) To find data, its owner of data and anything else that helps compliance To solve business problems based on data To get an overview of whole data in the org To define and document data standards To be responsible and accountable for data Lack of clarity on ownership of data New role in Saxo Bank, so yet to own full responsibility Lack of clarity on what to mandate at what level (federated or data domain level) Data missing/ incomplete See who can explain data elements Not sure if the data can be trusted for making the right decisions 6
  • 7. DATA MESH APPROACH - PRODUCT THINKING 7 DIRECT AGENT/ FLEET ENGAGEMENT SPEED OF DELIVERY Domain Polyglot Output Data Ports Polyglot Data Input Ports Control Ports Logs, metrics Self-serve description Discoverable Addressable Self-describing Trustworthy Interoperable 7
  • 9. STREAM PROCESSING DATA PLATFORM - HIGH LEVEL OVERVIEW 9 ConsumptionDomain Data Source Business Capabilities Business Capabilities Reports Trading Instruments Prices Customers/Parti es Product RX Data Workbench BUSINESS EVENTS STREAM PROCESSING (Enrich/Transform/Aggregate) DATA STORAGE& MANAGEMENT DW DL Storage DATA PRODUCTS Native Data Products Domain Data Products Aggregated Data Products Fit for Purpose Data Products Confluent Kafka Platform Data Catalog: DATAHUB DATA QUALITY Great Expectations: OTHER PROCESSING FRAMEWORKS
  • 10. ● SAXO features list ● SAXO initial evaluation params ● TW extended feature list ● Product documentation ● Software: Local installations ● Vendor questionnaire ● Includes initial & shortlisted list ● Data Catalog ● Data Quality Evaluation Criteria Definition Shortlisted tools Evaluation Process TOOLS EVALUATION PROCESS 10
  • 11. 11 DATA CATALOG TOOL EVALUATION 11 Prioritized Feature List ● Full Text search on dataset name, attributes and tags ● Extensiblesearch model Metadata Search ● Web-based UI to show metadata, governance attributes, tags and lineage ● Ability to edit and enrich attributes Metadata UI ● Push-based REST API ● Pull-based adapters for Snowflake and CRM dynamics ● Extensibility Metadata Ingestion ● Metadata entity for datasets, its users and attributes ● Business glossary & documentation ● Extensibility Metadata Modelling ● Dataset lineagewith upstream and downstream provenance ● Integration with data processing/orchestration tools Data Lineage ● Support for metadata enrichment and tagging ● Ability to flaga dataset Data Stewardship ● Shows related Quality Attributes in UI ● Extensibleto integrate with any DQ tool Data Quality Integration ● Cloud-native(Scalable& High Availability) ● Configurable ● Extensible Architecture ● Data as a Product ● Distributed Domain Driven Architecture ● Self-serviceplatform Alignment with Data Mesh ● Authentication / LDAP ● Authorization / RBAC Security ● Metadata Versioning ● Data Virtualization ● ML/AI capabilities Deprioritized ● Export API Metadata Export ● LicensingCost ● Customization / Development cost Total Cost of Ownership ● Release cycle ● Community support ● Commercial Support ● Documentation Support 11
  • 12. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 12
  • 13. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 13
  • 14. DATA CATALOG TOOL EVALUATION Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context. Datahub Marquez Amundsen Collibra Zeenea Metadata Search * Metadata UI Editable Metadata Ingestion * Metadata Modelling * Data Lineage * Metadata Export * Data Stewardship Data Quality Integration Architecture * Only supports AWS Security * Alignment with Data Mesh * Support Total Cost of Ownership * Completelysuitable Partiallysuitable No/Minimal suitability Commercial Open Source • Push Based approach that supports Event Driven architecture. The solutionbuilt onthe principle of self-service andproducers know theirdata better and they canprovide the rich metadata so that it helps in discover the data-assets andencourages consumption. • DetailedEvaluationwascarriedout fromSaxo perspective. • Possibility of evolution • Extensibility with the open sourceand evolve the toolas per needs. Right fit from feature perspectivein terms of Data Governance maturity of the organization. • Leverage from larger community needs and also influence internal process when needed. • Reputation of LinkedIn, their success in Kafka and datahub scaling internally to LinkedIn volume • Promising Roadmap 14 **Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
  • 15. DATASET ONBOARDING - RESPONSIBILITIES Metadata ● Dataset/Data Product Metadata ○ Ownership Information ○ Reader Information ○ Topic Configuration Details ○ Dataset Structure (AVRO Schema) ○ Business Term mapping ○ Source Dataset Definition (Optional) ● Quality Rules Data Engineering ● Domain Transformations ● (Kafka Stream) PRODUCERS Metadata ● Consumer Details ● Usage Details ● Target Dataset details CONSUMERS Engineering Capabilities ● Supporting New Domains ● Metadata Integration ● DQ Integration Kafka PLATFORM-Lean Team 15