As part of the 2018 HPCC Systems Community Day event:
According to a Brazilian bureau market research study in March 2018, a total of 61.7 million Brazilians, which represents a staggering 40.5% of the country's population over 18 years old, are late on bill payments. The high delinquency causes interest rates charged in Brazil to be among the highest in the world, and they apply for everyone, regardless of credit history. A significant reason for this situation is that Brazil lacks a unified, national credit bureau that can provide the banks with an accurate indication of credit risk. Enter HPCC Systems. In this presentation, we will show you how HPCC Systems has been used to build a credit bureau from scratch in Brazil, making use of state-of-the-art ECL to automate not only file ingestion and profiling, but also to automatically generate ECL for the complete data pipeline processing. Additionally, we will show how we used HPCC Systems to easily create and test analytics attributes to deliver risk related products, such as credit and fraud scores. These scores will enhance the ability of credit grantors in Brazil to leverage the data available, enabling them to better distinguish the good payers from the bad ones, and ultimately allowing for lower interest rates that increase access to credit and foster economic growth.
David Wheelock, Mauricio Nunes de Oliveira,
Robert Berger & Lucas Sobrinho, LexisNexis Risk
Solutions
2. • 5th most populous country
208.6 million
inhabitants
• 9th highest in the world
GDP of 2.138
Trillions USD
• 5th largest country
• Bigger than continental USA
3.28 million
square miles
• São Paulo
• Brasília
Notable cities
São
Paulo
Brasíli
a
How HPCC Systems Is Building the next generation Credit Bureau 2
7. How HPCC Systems Is Building the next generation Credit Bureau 7
Unified Data
Model
23
layouts
8. Data Pipeline
How HPCC Systems Is Building the next generation Credit Bureau 8
File arrives in the
Landing Zone
File goes through
several processing
steps (ETL)
9. How HPCC Systems Is Building the next generation Credit Bureau 9
Registration
data
name
e-mail
phone
Person
person_name
cpf
Phone
phone_number
phone_type
E-mail
email_address
email_type
{
10. How HPCC Systems Is Building the next generation Credit Bureau 10
CIP
11. Data Pipeline
How HPCC Systems Is Building the next generation Credit Bureau 11
File arrives in the
Landing Zone
Data is automatically
profiled
File goes through
several processing
steps (ETL)
12. Fieldname Rec # % populated Max length Avg length
type 68432 90.8 1 1
name 68432 60 55 27
Auto Profiling
How HPCC Systems Is Building the next generation Credit Bureau 12
Fieldname Cardinality Length Frequent terms Patterns
type 1 1 1,2 9
name 43098 23,25 John, Maria aaaa
13. Data Pipeline
13
File arrives in the
Landing Zone
Data is automatically
profiled
File goes through
several processing
steps (ETL)
Enterprise
Service
Platform
How HPCC Systems Is Building the next generation Credit Bureau
14. OSS Enterprise
Service
Platform
Enterprise Service Platform
14
Fully based on open source
HPCC Systems Platform
End-to end HTTPS
support
Web Services as HPCC
Systems components
Authentication,
Authorization and
Accounting
Bridge between
external clients and
ROXIE queries
Fully configurable via
Configuration Manager
How HPCC Systems Is Building the next generation Credit Bureau
15. Enterprise Service Platform
15
Consumer requests
information through
an external
application
Authentication
Transaction
Logging
Enterprise
Service Platform
Authorization
ROXIE query
Client Response
How HPCC Systems Is Building the next generation Credit Bureau
17. How HPCC Systems Is Building the next generation Credit Bureau 17
Notas del editor
[Wheelock]
Introduction slide.
America 3rd largest. Brasil 5th largest (11% smaller – 3 ¼ M square miles)
Perceptions (insert humor here!):
Documentaries: Turistas, Anaconda. (Anaconda not completely fictitious – still occasionally have sightings in Brasil… of Jennifer Lopez)
Carnival: Even criminals take vacation for it.
Politics: Corruption not an issue – they have the best politicians money can buy.
Reality:
Rich, vibrant history and culture. Giant, emerging global economy. FANTASTIC food.
Political corruption: yeah, still an issue.
[Wheelock]
Purpose of project: Again, Giant emerging economy. Need to provide real credit score to enable better lending rates
How we use HPCC in our solution
Special Brasil-only considerations
Using ESP to streamline and secure transactions
[Wheelock]
Banking consortium in Brasil needed credit bureau. Massive data to combine -- needed technical partner to build.
Many companies competing for project including IBM, Experian, TransUnion
Process started in 2014 with banking consortium – first roadshow end of 2014 to show how quickly we can turn data around
Second roadshow, end of 2016, show all stakeholders our plan. First few employees hired.
Early 2017 – picked as vendor
Contract achieved middle of 2017
Office opened in Sept, 2017
[Wheelock]
Of the adult population in Brasil, 40% (60M people) were delinquent on payments in March, 2018.
Interest rates among highest in the world. Everyone gets the same rates to spread risk.
We take credit bureaus for granted (and stress out about them)
Necessary for lenders to control risk
Controlled risk enables good rates for good customers, improves middle class, improves economy.
Robert will describe how we incorporated HPCC into project
[Robert]
Mention this is from one of the DCs, and there are many more like this
[Robert]
23 different layouts, Relationship, Addresses, Loans, Personal information, Banking, Assets, Businesses, Payment Behavior, Credit Card, and more, fixed and XML formats
UDM with 34 distinct logical tables, keep same logical information in one location
Not relational DB, somewhat in a relational design. Superfiles for each “table”
Advantages this solution brings to the project
[Robert]
Files arrive in the Landing Zone, dropped by another piece of the solution that got it from our external facing dropzones
ETL written in ECL cleans and standardizes the data, which is the focus of the next slides as we dive deeper in a few parts of the ETL
Keys are built
[Robert]
From multiple input files, fields are stored on proper tables
Tool built allows for drag and drop mapping of fields
Tool outputs configuration XML file that is interpreted by an ECL Macro to generate the projection
[Mauricio]
Explain Cadastro Positivo workflow - that in order to receive the positive payment behavior we need to comply with a message exchange system in which we have to generate and send files and not only receive and process them
Through this system consumer disputes are handled directly in the system
Highlight that HPCC Systems is not just used to process the data, but as a “communication tool”
[Mauricio]
Processed data follows two parallel flows for Positive data, with the generated file being sent back to the source
The data being ingested moves down the data pipeline, going to the next step which is being automatically profiled
[Mauricio]
Automatic SALT profiling running in a CRON job
Regardless of the layout of the file received (as long as known), it is automatically profiled, providing 2 outputs:
A set of SALT reports: “Inverted Summary” and “All Profiles” reports
CPF matching across every other file received
SALT InvSummary and AllProfiles give us a high level view of the data populating each field, CPF matches allow us to link the people between different file submissions from all institutions
Auto generation of SALT specification profile files
[Sobrinho]
At the final stage of the pipeline the data becomes ready to be queried by the customers
So now let the queries come right straight at us, right? Wrong… We need to make sure the people and systems accessing our products are authenticated, we need to log every transaction made, and we need to account for all of them in order to bill the customer later. We need something in between our ROXIE queries and the clients that can give us that. Do any of you have an idea on what it is? Hint: It is a Middleware. Most of you have probably heard about it: The Enterprise Service Platform, or ESP for short.
Do you guys know that page you access to test the queries you’ve published? At port 8002 usually? That is an ESP service, the ws_ecl. This means that the ESP comes bundled with the HPCC Systems open source. When you clone the repository from GitHub it comes as part of the project. And like every good open source code, you get to tweak it!
That is what we did, the open source version of it doesn’t already included everything we needed, but it was sitting in the right place, ripe for our C++ engineers to add the functionalities to it.
[Sobrinho]
The Brazil Bureau’s Enterprise Service Platform (ESP) is fully based in the code available in the open source HPCC Systems Platform.
Web services are implemented using features already available in the open source codebase. ESP instances and web services are configurable as HPCC Systems components, similar to adding a new ROXIE or Thor instance to your configuration. This configuration can be done via Configuration Manager, a tool that let’s you configure all aspects of the HPCC Systems cluster, including your custom ESP web services and components.
Authentication for HTTP requests, authorization to control access to the data and accounting are all implemented using existing classes and interfaces available in HPCC Systems codebase, as well as using open source libraries used widely.
The main goal of ESP is to create a bridge between external clients (i.e. a web portal) and the ROXIE queries, including any custom feature implemented to be called before or after communicating to ROXIE. One important aspect of this open source ESP solution is the use of ESDL and Dynamic ESDL, thus reducing code complexity and increasing flexibility to access new ROXIE queries.
Another important aspect of the open source solution is the extensive use of HTTPS protocol, end-to-end, increasing security to the data being transferred between systems in your data pipeline.
[Sobrinho]
In summary, this is how an external request from a client is processed using our open source ESP solution.
Consumer requests data from an external application.
Request is received by ESP.
ESP authenticates the user making the request.
ESP performs authorization routines to make sure the user is authorized to see the data being requested.
The respective ROXIE query is called, requesting data the user wants.
The transaction is logged for accounting and billing purposes.
Client receives the data requested from the ESP response.
[Wheelock]
Talk about how we are generating attributes, and contrast with usual analytics way of coming up and testing them.
All of this data is used to deliver products, most of these products are score related – both credit and fraud. To come up with these scores we have to build models, and these models use several attributes defined by a data specialist in order to calculate the scores. We rapidly created and tested attributes in a matter of minutes. For comparison purposes, using the regular analytics tools outside the HPCC Systems like SAS and R, the same task took not hours or days, but weeks to be accomplished! This allowed us to quickly identify the best attributes that had a correlation given that data available, no matter how complex they were.
[Wheelock]
Closing slide with a overview of everything that was described and how we hope HPCC Systems will make Brazil a better place in the long run.