SlideShare una empresa de Scribd logo
1 de 17
Innovation and
Reinvention Driving
Transformation
OCTOBER 9, 2018
2018 HPCC Systems® Community
Day
David Wheelock, Mauricio Nunes, Lucas Sobrinho & Robert Berger
How HPCC Systems is Building the next generation Credit
Bureau
• 5th most populous country
208.6 million
inhabitants
• 9th highest in the world
GDP of 2.138
Trillions USD
• 5th largest country
• Bigger than continental USA
3.28 million
square miles
• São Paulo
• Brasília
Notable cities
São
Paulo
Brasíli
a
How HPCC Systems Is Building the next generation Credit Bureau 2
The Brazil
Bureau
How HPCC Systems Is Building the next generation Credit Bureau 4
TransUnion
Experian
IBM
How HPCC Systems Is Building the next generation Credit Bureau 5
The Technology
How HPCC Systems Is Building the next generation Credit Bureau 6
How HPCC Systems Is Building the next generation Credit Bureau 7
Unified Data
Model
23
layouts
Data Pipeline
How HPCC Systems Is Building the next generation Credit Bureau 8
File arrives in the
Landing Zone
File goes through
several processing
steps (ETL)
How HPCC Systems Is Building the next generation Credit Bureau 9
Registration
data
name
e-mail
phone
Person
person_name
cpf
Phone
phone_number
phone_type
E-mail
email_address
email_type
{
How HPCC Systems Is Building the next generation Credit Bureau 10
CIP
Data Pipeline
How HPCC Systems Is Building the next generation Credit Bureau 11
File arrives in the
Landing Zone
Data is automatically
profiled
File goes through
several processing
steps (ETL)
Fieldname Rec # % populated Max length Avg length
type 68432 90.8 1 1
name 68432 60 55 27
Auto Profiling
How HPCC Systems Is Building the next generation Credit Bureau 12
Fieldname Cardinality Length Frequent terms Patterns
type 1 1 1,2 9
name 43098 23,25 John, Maria aaaa
Data Pipeline
13
File arrives in the
Landing Zone
Data is automatically
profiled
File goes through
several processing
steps (ETL)
Enterprise
Service
Platform
How HPCC Systems Is Building the next generation Credit Bureau
OSS Enterprise
Service
Platform
Enterprise Service Platform
14
Fully based on open source
HPCC Systems Platform
End-to end HTTPS
support
Web Services as HPCC
Systems components
Authentication,
Authorization and
Accounting
Bridge between
external clients and
ROXIE queries
Fully configurable via
Configuration Manager
How HPCC Systems Is Building the next generation Credit Bureau
Enterprise Service Platform
15
Consumer requests
information through
an external
application
Authentication
Transaction
Logging
Enterprise
Service Platform
Authorization
ROXIE query
Client Response
How HPCC Systems Is Building the next generation Credit Bureau
16
Attributes
account
id
amount
days_late
date_due
id_contract
is_active
Attribute engine
How HPCC Systems Is Building the next generation Credit Bureau
id attr1 attr2 attr3
1 2 0.45 812
2 5 0.12 1509
3 0 0.87 401
How HPCC Systems Is Building the next generation Credit Bureau 17

Más contenido relacionado

Similar a How HPCC Systems is Building the next generation Credit Bureau

What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...Precisely
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Societyconfluent
 
Transforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming DataTransforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming Dataconfluent
 
AccessGE_Data_Centers_06-18-13_final
AccessGE_Data_Centers_06-18-13_finalAccessGE_Data_Centers_06-18-13_final
AccessGE_Data_Centers_06-18-13_finaltrojans99
 
Blockchain and its impact on Data Science and Financial Services
Blockchain and its impact on Data Science and Financial ServicesBlockchain and its impact on Data Science and Financial Services
Blockchain and its impact on Data Science and Financial ServicesRatnakar Pandey
 
MondCloud Semantic Data Hub for Insurance
MondCloud Semantic Data Hub for InsuranceMondCloud Semantic Data Hub for Insurance
MondCloud Semantic Data Hub for InsuranceGeetha Sreedhar, MBA
 
A Tale of Two Enterprise Public Cloud Applications
A Tale of Two Enterprise Public Cloud ApplicationsA Tale of Two Enterprise Public Cloud Applications
A Tale of Two Enterprise Public Cloud ApplicationsBrian McCallion
 
Integrate All The Things WS02Con
Integrate All The Things WS02ConIntegrate All The Things WS02Con
Integrate All The Things WS02ConJames Governor
 
A Connected Future Starts Inside CoreSite's Data Centers
A Connected Future Starts Inside CoreSite's Data CentersA Connected Future Starts Inside CoreSite's Data Centers
A Connected Future Starts Inside CoreSite's Data CentersMike Trawick
 
The Policy, Planning and Pragmatic Reasons
The Policy, Planning and Pragmatic ReasonsThe Policy, Planning and Pragmatic Reasons
The Policy, Planning and Pragmatic ReasonsAlexandro Colorado
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkJim Kaplan CIA CFE
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE
 
Delivering cost-effective financial advice
Delivering cost-effective financial adviceDelivering cost-effective financial advice
Delivering cost-effective financial adviceIRESS
 
21 CFR Part 11 - 20 Years Later
21 CFR Part 11 - 20 Years Later21 CFR Part 11 - 20 Years Later
21 CFR Part 11 - 20 Years LaterCSols, Inc.
 
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...Michael Coté
 
Blockchain as a new cyber strategy for your business
Blockchain as a new cyber strategy for your businessBlockchain as a new cyber strategy for your business
Blockchain as a new cyber strategy for your businessDavid Joao Vieira Carvalho
 

Similar a How HPCC Systems is Building the next generation Credit Bureau (20)

What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Transforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming DataTransforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming Data
 
Drp 12 08 V2
Drp 12 08  V2Drp 12 08  V2
Drp 12 08 V2
 
AccessGE_Data_Centers_06-18-13_final
AccessGE_Data_Centers_06-18-13_finalAccessGE_Data_Centers_06-18-13_final
AccessGE_Data_Centers_06-18-13_final
 
Blockchain and its impact on Data Science and Financial Services
Blockchain and its impact on Data Science and Financial ServicesBlockchain and its impact on Data Science and Financial Services
Blockchain and its impact on Data Science and Financial Services
 
MondCloud Semantic Data Hub for Insurance
MondCloud Semantic Data Hub for InsuranceMondCloud Semantic Data Hub for Insurance
MondCloud Semantic Data Hub for Insurance
 
A Tale of Two Enterprise Public Cloud Applications
A Tale of Two Enterprise Public Cloud ApplicationsA Tale of Two Enterprise Public Cloud Applications
A Tale of Two Enterprise Public Cloud Applications
 
Integrate All The Things WS02Con
Integrate All The Things WS02ConIntegrate All The Things WS02Con
Integrate All The Things WS02Con
 
Accomplishments
AccomplishmentsAccomplishments
Accomplishments
 
Ceramic invoice final
Ceramic invoice finalCeramic invoice final
Ceramic invoice final
 
A Connected Future Starts Inside CoreSite's Data Centers
A Connected Future Starts Inside CoreSite's Data CentersA Connected Future Starts Inside CoreSite's Data Centers
A Connected Future Starts Inside CoreSite's Data Centers
 
The Policy, Planning and Pragmatic Reasons
The Policy, Planning and Pragmatic ReasonsThe Policy, Planning and Pragmatic Reasons
The Policy, Planning and Pragmatic Reasons
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE Overview
 
Delivering cost-effective financial advice
Delivering cost-effective financial adviceDelivering cost-effective financial advice
Delivering cost-effective financial advice
 
21 CFR Part 11 - 20 Years Later
21 CFR Part 11 - 20 Years Later21 CFR Part 11 - 20 Years Later
21 CFR Part 11 - 20 Years Later
 
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...
Rethinking enterprise architecture for DevOps, Agile, and cloud native organi...
 
Blockchain as a new cyber strategy for your business
Blockchain as a new cyber strategy for your businessBlockchain as a new cyber strategy for your business
Blockchain as a new cyber strategy for your business
 

Más de HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

Más de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Último

TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 

Último (17)

TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 

How HPCC Systems is Building the next generation Credit Bureau

  • 1. Innovation and Reinvention Driving Transformation OCTOBER 9, 2018 2018 HPCC Systems® Community Day David Wheelock, Mauricio Nunes, Lucas Sobrinho & Robert Berger How HPCC Systems is Building the next generation Credit Bureau
  • 2. • 5th most populous country 208.6 million inhabitants • 9th highest in the world GDP of 2.138 Trillions USD • 5th largest country • Bigger than continental USA 3.28 million square miles • São Paulo • Brasília Notable cities São Paulo Brasíli a How HPCC Systems Is Building the next generation Credit Bureau 2
  • 4. How HPCC Systems Is Building the next generation Credit Bureau 4 TransUnion Experian IBM
  • 5. How HPCC Systems Is Building the next generation Credit Bureau 5
  • 6. The Technology How HPCC Systems Is Building the next generation Credit Bureau 6
  • 7. How HPCC Systems Is Building the next generation Credit Bureau 7 Unified Data Model 23 layouts
  • 8. Data Pipeline How HPCC Systems Is Building the next generation Credit Bureau 8 File arrives in the Landing Zone File goes through several processing steps (ETL)
  • 9. How HPCC Systems Is Building the next generation Credit Bureau 9 Registration data name e-mail phone Person person_name cpf Phone phone_number phone_type E-mail email_address email_type {
  • 10. How HPCC Systems Is Building the next generation Credit Bureau 10 CIP
  • 11. Data Pipeline How HPCC Systems Is Building the next generation Credit Bureau 11 File arrives in the Landing Zone Data is automatically profiled File goes through several processing steps (ETL)
  • 12. Fieldname Rec # % populated Max length Avg length type 68432 90.8 1 1 name 68432 60 55 27 Auto Profiling How HPCC Systems Is Building the next generation Credit Bureau 12 Fieldname Cardinality Length Frequent terms Patterns type 1 1 1,2 9 name 43098 23,25 John, Maria aaaa
  • 13. Data Pipeline 13 File arrives in the Landing Zone Data is automatically profiled File goes through several processing steps (ETL) Enterprise Service Platform How HPCC Systems Is Building the next generation Credit Bureau
  • 14. OSS Enterprise Service Platform Enterprise Service Platform 14 Fully based on open source HPCC Systems Platform End-to end HTTPS support Web Services as HPCC Systems components Authentication, Authorization and Accounting Bridge between external clients and ROXIE queries Fully configurable via Configuration Manager How HPCC Systems Is Building the next generation Credit Bureau
  • 15. Enterprise Service Platform 15 Consumer requests information through an external application Authentication Transaction Logging Enterprise Service Platform Authorization ROXIE query Client Response How HPCC Systems Is Building the next generation Credit Bureau
  • 16. 16 Attributes account id amount days_late date_due id_contract is_active Attribute engine How HPCC Systems Is Building the next generation Credit Bureau id attr1 attr2 attr3 1 2 0.45 812 2 5 0.12 1509 3 0 0.87 401
  • 17. How HPCC Systems Is Building the next generation Credit Bureau 17

Notas del editor

  1. [Wheelock] Introduction slide. America 3rd largest. Brasil 5th largest (11% smaller – 3 ¼ M square miles) Perceptions (insert humor here!): Documentaries: Turistas, Anaconda. (Anaconda not completely fictitious – still occasionally have sightings in Brasil… of Jennifer Lopez) Carnival: Even criminals take vacation for it. Politics: Corruption not an issue – they have the best politicians money can buy. Reality: Rich, vibrant history and culture. Giant, emerging global economy. FANTASTIC food. Political corruption: yeah, still an issue.
  2. [Wheelock] Purpose of project: Again, Giant emerging economy. Need to provide real credit score to enable better lending rates How we use HPCC in our solution Special Brasil-only considerations Using ESP to streamline and secure transactions
  3. [Wheelock] Banking consortium in Brasil needed credit bureau. Massive data to combine -- needed technical partner to build. Many companies competing for project including IBM, Experian, TransUnion Process started in 2014 with banking consortium – first roadshow end of 2014 to show how quickly we can turn data around Second roadshow, end of 2016, show all stakeholders our plan. First few employees hired. Early 2017 – picked as vendor Contract achieved middle of 2017 Office opened in Sept, 2017
  4. [Wheelock] Of the adult population in Brasil, 40% (60M people) were delinquent on payments in March, 2018. Interest rates among highest in the world. Everyone gets the same rates to spread risk. We take credit bureaus for granted (and stress out about them) Necessary for lenders to control risk Controlled risk enables good rates for good customers, improves middle class, improves economy. Robert will describe how we incorporated HPCC into project
  5. [Robert] Mention this is from one of the DCs, and there are many more like this
  6. [Robert] 23 different layouts, Relationship, Addresses, Loans, Personal information, Banking, Assets, Businesses, Payment Behavior, Credit Card, and more, fixed and XML formats UDM with 34 distinct logical tables, keep same logical information in one location Not relational DB, somewhat in a relational design. Superfiles for each “table” Advantages this solution brings to the project
  7. [Robert] Files arrive in the Landing Zone, dropped by another piece of the solution that got it from our external facing dropzones ETL written in ECL cleans and standardizes the data, which is the focus of the next slides as we dive deeper in a few parts of the ETL Keys are built
  8. [Robert] From multiple input files, fields are stored on proper tables Tool built allows for drag and drop mapping of fields Tool outputs configuration XML file that is interpreted by an ECL Macro to generate the projection
  9. [Mauricio] Explain Cadastro Positivo workflow - that in order to receive the positive payment behavior we need to comply with a message exchange system in which we have to generate and send files and not only receive and process them Through this system consumer disputes are handled directly in the system Highlight that HPCC Systems is not just used to process the data, but as a “communication tool”
  10. [Mauricio] Processed data follows two parallel flows for Positive data, with the generated file being sent back to the source The data being ingested moves down the data pipeline, going to the next step which is being automatically profiled
  11. [Mauricio] Automatic SALT profiling running in a CRON job Regardless of the layout of the file received (as long as known), it is automatically profiled, providing 2 outputs: A set of SALT reports: “Inverted Summary” and “All Profiles” reports CPF matching across every other file received SALT InvSummary and AllProfiles give us a high level view of the data populating each field, CPF matches allow us to link the people between different file submissions from all institutions Auto generation of SALT specification profile files
  12. [Sobrinho] At the final stage of the pipeline the data becomes ready to be queried by the customers So now let the queries come right straight at us, right? Wrong… We need to make sure the people and systems accessing our products are authenticated, we need to log every transaction made, and we need to account for all of them in order to bill the customer later. We need something in between our ROXIE queries and the clients that can give us that. Do any of you have an idea on what it is? Hint: It is a Middleware. Most of you have probably heard about it: The Enterprise Service Platform, or ESP for short. Do you guys know that page you access to test the queries you’ve published? At port 8002 usually? That is an ESP service, the ws_ecl. This means that the ESP comes bundled with the HPCC Systems open source. When you clone the repository from GitHub it comes as part of the project. And like every good open source code, you get to tweak it! That is what we did, the open source version of it doesn’t already included everything we needed, but it was sitting in the right place, ripe for our C++ engineers to add the functionalities to it.
  13. [Sobrinho] The Brazil Bureau’s Enterprise Service Platform (ESP) is fully based in the code available in the open source HPCC Systems Platform. Web services are implemented using features already available in the open source codebase. ESP instances and web services are configurable as HPCC Systems components, similar to adding a new ROXIE or Thor instance to your configuration. This configuration can be done via Configuration Manager, a tool that let’s you configure all aspects of the HPCC Systems cluster, including your custom ESP web services and components. Authentication for HTTP requests, authorization to control access to the data and accounting are all implemented using existing classes and interfaces available in HPCC Systems codebase, as well as using open source libraries used widely. The main goal of ESP is to create a bridge between external clients (i.e. a web portal) and the ROXIE queries, including any custom feature implemented to be called before or after communicating to ROXIE. One important aspect of this open source ESP solution is the use of ESDL and Dynamic ESDL, thus reducing code complexity and increasing flexibility to access new ROXIE queries. Another important aspect of the open source solution is the extensive use of HTTPS protocol, end-to-end, increasing security to the data being transferred between systems in your data pipeline.
  14. [Sobrinho] In summary, this is how an external request from a client is processed using our open source ESP solution. Consumer requests data from an external application. Request is received by ESP. ESP authenticates the user making the request. ESP performs authorization routines to make sure the user is authorized to see the data being requested. The respective ROXIE query is called, requesting data the user wants. The transaction is logged for accounting and billing purposes. Client receives the data requested from the ESP response.
  15. [Wheelock] Talk about how we are generating attributes, and contrast with usual analytics way of coming up and testing them. All of this data is used to deliver products, most of these products are score related – both credit and fraud. To come up with these scores we have to build models, and these models use several attributes defined by a data specialist in order to calculate the scores. We rapidly created and tested attributes in a matter of minutes. For comparison purposes, using the regular analytics tools outside the HPCC Systems like SAS and R, the same task took not hours or days, but weeks to be accomplished! This allowed us to quickly identify the best attributes that had a correlation given that data available, no matter how complex they were.
  16. [Wheelock] Closing slide with a overview of everything that was described and how we hope HPCC Systems will make Brazil a better place in the long run.