Verizon: Finance Data Lake implementation as a Self Service Discovery Big Data Platform

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or
distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
Verizon: Finance Data Lake
Implementation as a Self Service
Discovery Big Data Platform
Sreenath Akinepalli
Sandeep Katuku
June 2017
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject
to change without notice. All trademarks used herein are property of their respective owners.

• Why Data Lake?
• Finance Data Lake Value
• Use Cases
• Architecture Overview
• Data Ingestion at Scale
• Data Validation
• Security
• Self Service Discovery
• Takeaways
2
Agenda

Why Data Lake?
• 70% of time spent data gathering vs 30% analysis
• Data exists in multiple ERP systems and other silos
• Data Replication through point to point interfaces
• Lack of Normalization & Harmonization
• Data Latency
3

Finance Data Lake Value
Centralized Enterprise Data Repository & Self-Service Discovery Platform
Simplifies
access to raw
ERP data
Eliminate data
replication – lower
TCO
Enable Data Share -
reduce # of point to
point integrations
Rationalize &
harmonize master
data
Centrally apply
common business
rules
Single set of
Reporting &
Analytical tools
Drive Data
Archiving
Strategy
to change without notice. All trademarks used herein are property of their respective owners. 4

Use Cases
• Accounts Payable (AP)
Working Capital Analytics
• Historical DataMart
• Spend Analytics
• Labor Transformation
• Audit & Compliance
• Capital Reporting &
Analytics
5

distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 6
Architecture Overview
Source Systems
Ingestion
Store
Semantic Model
Data Consumption
Transformation
Governance
3
2
1
Data Lake
Logical Model
Reporting & Analytical Tools
Other Data
Sources
External
Applications
Interfaces
Security
ETL
ERP1 ERP2 ERP3

Data Ingestion - Design Success Factors
• Hadoop as Target
• Multiple Data Sources
• Transactional Source Systems
(OLTP)
• ACID Limitations
• Different types of tables
• Ability to Scale to thousands of
tables
• End to End Traceability
• Identifying the Tools

Databases
Data Ingestion at Scale
FINANCE DATA LAKE
• Metadata Driven Design
• Dynamic Object Creation
• Supports File, Batch & Incremental Ingestion
• File & Batch Ingest Data directly to Hadoop
• Incremental data streams from Source to Staging
• Micro batch process moves data to Hadoop
• Data Merge using Lambda Architecture
Text Files
Landing
Staging
Lambda Architecture
File
Ingestion
Incremental
Ingestion
Batch
Ingestion

Data Ingestion - Data Patterns
Challenge Solution
Handling Deletes Prior snapshots as deletes
Handling Updates Updates as deletes and inserts
Primary Key updates Prior snapshots as deletes
Concurrent Operations Using transaction id
Batch Operations Configuration to capture truncates

Data Ingestion - Enhancement
• Simplified Architecture
• Ingest Data directly to Hadoop
• Supports all ERP tables
• Dynamic DDL changes from Source to Target
ERP3
FINANCE DATA LAKE
ERP2
ERP1
ERP(SOURCE)
Attunity
Replicate
Lambda Architecture

Data Validations
• 4000 Automated Validations
• Row & Column count Dashboards
• Source to Hadoop Comparisons
• Data Latency Dashboards
• Report Reconciliation
Row Count
Column
Count
Aggregation

Security
Access: (To Hadoop Cluster)
• Authentication
• Kerberos (Direct Access)
Perimeter Level Security:
• Network Security (firewalls)
• Apache KNOX (Gateway – BI Tools)
Data: (Protecting data in Cluster)
• Authorization
• Ranger
• Role Based
• Row/Column Level
• Encryption
• Data Masking
Audit/Administration
• Access Review
• Log Monitoring
Finance Data Lake

Self-Service Discovery
Explore TransformModel Discover Prepare & Share
relevant data data to
understand
its potential
& enrich
data to make
it ready for
analysis
powerful
New
insights
insights for
enterprise
leverage
Discovering the True Potential of Big Data
Data Lake Platform

Takeaways
• Data lake based on Hadoop big data platform is the right
choice for self service discovery & analytics
• Adopt an Agile mindset in the implementation
• Evolve the architecture with the right tools for the right job
14

Thank You

Verizon: Finance Data Lake implementation as a Self Service Discovery Big Data Platform

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Verizon: Finance Data Lake implementation as a Self Service Discovery Big Data Platform

Similar a Verizon: Finance Data Lake implementation as a Self Service Discovery Big Data Platform (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Verizon: Finance Data Lake implementation as a Self Service Discovery Big Data Platform

Notas del editor