SlideShare a Scribd company logo
1 of 18
©2018 Impetus Technologies, Inc. All rights reserved.
You are prohibited from making a copy or modification of, or from redistributing,
rebroadcasting, or re-encoding of this content without the prior written consent of
Impetus Technologies.
This presentation may include images from other products and services. These
images are used for illustrative purposes only. Unless explicitly stated there is no
implied endorsement or sponsorship of these products by Impetus Technologies. All
copyrights and trademarks are property of their respective owners.
Migrating Analytics to the Cloud at Fannie Mae
Kevin Bates
VP, Enterprise Data Strategy Execution
Fannie Mae
Migrating Analytics to the Cloud
at Fannie Mae
Analytics Overview
Why migrate analytics to the Cloud?
Our Approach
Challenges and Lessons Learned
Analytics at Fannie Mae
Market Risk
Financials
New Ideas
Pricing
Measuring Progress
Analytics drive the business at Fannie Mae….
Why migrate analytics to the Cloud?
Opportunities to empower new (and old) capabilities!
Analytics and reporting tools will continue to
propagate
AI Libraries
Data Federation &
Virtualization
Data Integration & ETL
Analytics LOB
Applications
Self-serve Analytics &
Visualization
Data Catalogs &
Metadata
Data Science Platforms
Self-serve Data Prep
Data & Compute
Platforms
Traditional BI
Rather than fight the
changes and limit
choice we need a
platform that enables
choice and manages
the complexity
Opportunities to drive efficiency and sharing…
Active Analytic Catalog
3rd Party
Cloud
On-Prem
1. Connect to data
tables
2. Join, massage,
aggregate, or shape
the data
3. Create calculations,
derivations,
expressions,
aggregations
Data Science Tools
1. Connect to data
tables
2. Join, massage,
aggregate, or shape
the data
3. Create calculations,
derivations,
expressions,
aggregations
BI Tools
1. Connect to data
tables
2. Join, massage,
aggregate, or shape
the data
3. Create calculations,
derivations,
expressions,
aggregations
Line of Bus. Tools
∞
2. Join, massage,
aggregate, or shape
the data
3. Create calculations,
derivations,
expressions,
aggregations
4. Use tool-specific
functions: send campaign,
view model, etc.
4. Use tool-specific
functions: send campaign,
view model, etc.
4. Use tool-specific
functions: send campaign,
view model, etc.
4. Use tool-specific
functions: send campaign,
view model, etc.
New/Custom Application
1x
1x
1x
∞
Re-Use
Analytic Reuse
1. Connect to data
tables
Our Approach
Fannie Mae’s experience with Data Lakes
2014
Open
source
Hadoop
2015
Analytics
Cluster using
proprietary
Hadoop
distribution
2016
Data Lake using
proprietary
Hadoop
distribution
2017
Data Lake using
cloud native
technologies
2018
Driving Data
Lake adoption
Fannie Mae has been in
forefront in adopting to
cloud industry advancements
Approach #1: Take a Governance View
Enterprise Data Lake
BI Reports & Dashboards
Ad-hoc and what if Queries
Data as Service
Data Science Results
?
Business Transaction Data
3rd Party Data
Reference Data
Deal and Delivery
Documents
Structured/SemiStructuredUnstructured
Data life-cycle
Metadata
Data
Security
Data
Lineage
App
User
Enter-
prise
Data
Zones
Data
Usage
Data
Standards
Access Control
Platform
Utilization
Focus areas to automate or enable tools to manage data lake.
Data Certification
Compliance Requirements
Preparation &
Transformatio
n
What goes in?
Ingested
What’s done with it?
Processed
What goes out?
Consumed
Approach #2: Think about Personas
User Zone
Enterprise Data Lake (EDL)Data Scientist /
Analyst
Data
Discovery
Data reads or copy
from other zones
into User Zone
Data contained
(No outward movement)
Developer
Data ingestion from
external source
User data/results
(Local Governance)
Data
Discovery
(EDL)
Data ingestion from
external source.
(Provide catalog)
Provide NPI
Classification of
external data
Process and Insight Layer
Governance
(Extended Metadata)
Data Reads and movement
between zones
(Controls and Metadata)
Schema Design and
Data Catalog
External movement/
Disclosures
(Controls and Catalog)
No NPI
EDL RBAC
No NPI
External Data
Data
External
Data
* Not all personas shown
Enterprise Zone App Zone
Data Layers
InsightLanding Prep.
Data Layers
InsightLanding Prep.
Approach #3: How can we bring two worlds together?
Traditional BI
AI Libraries
Data Catalogs & Metadata
Data & Compute Platforms
Data Federation & Virtualization
Data Integration & ETL
Analytics LOB
Applications
Self-Serve Analytics &
Visualization
Data Science Platforms
Self-Serve Data Prep
Data collaboration platform (centralized service catalog, federated delivery, lineage maintained)
SDLC-drivenTechPlatformsBusinessAnalytics
Approach #4: It’s new and evolving, so leverage
partners who can think end-to-end
Worked with Impetus to establish new patterns for
analytics data provisioning
Use case involved retirement project and cloud transition
Implementation required full production context (real
production, real users)
Solution included:
• One time historical data migration (prem to cloud)
• Migration of existing base tables and snapshots
• New build for cloud-hosted dimensions, snapshots
• New build for ongoing data flows (end-to-end)
Establish data extraction and ingestion framework
Job orchestration
Data transformation and change capture
Establish audit framework (operations, controls)
Capture reusable utilities and build the library
Monitor and report performance for each step
2
1
3
4
5
6
Challenges & Lessons Learned
Challenges
Cloud-adoption and Data Lake development can
require manual processes, hand-coding, and reliance
on command-line tools
Keeping track of your data, its lineage, and making it
easy to find
Coupling of ingestion and processing drives
architecture decisions
Operationalizing processes for production and to
maintain SLAs
Ensuring data is in canonical forms with a shared
schema usable by others
Coding or filing tickets to perform new ingestion and
processing tasks
Multiple architectures and technologies used by
different teams on different clusters
Guaranteeing compliance in a system that is
designed for schema-on-read and raw data
Sharing infrastructure in a multi-tenant “self
service” environment
Business awareness buy-in
What we have learned
Review your development practices holistically
• You need new patterns for data movement
• Don’t lift and shift!
Think Governance First!
• Incorporation of new processes into data
governance strategy
• Focus on sustainable practices that fully
envision how the end-to-end together
Engage strategic partners where it makes sense
Keep engaging your business partners to ensure
alignment
As the center of gravity of data moves toward
the cloud, hybrid strategies will become
increasingly important
This is a migration that, for seasoned
companies, will take time
Don’t migrate to the Cloud for tech reasons—
engage your business!
Thank you.
www.fanniemae.com
www.impetus.com

More Related Content

What's hot

Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsDataWorks Summit
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyDataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerDataWorks Summit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
Deep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenariosDeep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenariosSajith C P Nair
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNDataWorks Summit
 
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...DataWorks Summit
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 

What's hot (20)

Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test ResultsUncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
Uncovering an Apache Spark 2 Benchmark - Configuration, Tuning and Test Results
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Deep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenariosDeep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenarios
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
 
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
How to Achieve a Self-Service and Secure Multitenant Data Lake in a Large Com...
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 

Similar to Migrating Analytics to the Cloud at Fannie Mae

Achieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsAchieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsSense Corp
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyCloverDX
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...Agile Testing Alliance
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...ModusOptimum
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsRyan Gross
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfNeo4j
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleatSistemas
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarImpetus Technologies
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeDataWorks Summit
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 

Similar to Migrating Analytics to the Cloud at Fannie Mae (20)

Achieve New Heights with Modern Analytics
Achieve New Heights with Modern AnalyticsAchieve New Heights with Modern Analytics
Achieve New Heights with Modern Analytics
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Starting Your Modern DataOps Journey
Starting Your Modern DataOps JourneyStarting Your Modern DataOps Journey
Starting Your Modern DataOps Journey
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
About CDAP
About CDAPAbout CDAP
About CDAP
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data Lake
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Migrating Analytics to the Cloud at Fannie Mae

  • 1. ©2018 Impetus Technologies, Inc. All rights reserved. You are prohibited from making a copy or modification of, or from redistributing, rebroadcasting, or re-encoding of this content without the prior written consent of Impetus Technologies. This presentation may include images from other products and services. These images are used for illustrative purposes only. Unless explicitly stated there is no implied endorsement or sponsorship of these products by Impetus Technologies. All copyrights and trademarks are property of their respective owners.
  • 2. Migrating Analytics to the Cloud at Fannie Mae Kevin Bates VP, Enterprise Data Strategy Execution Fannie Mae
  • 3. Migrating Analytics to the Cloud at Fannie Mae Analytics Overview Why migrate analytics to the Cloud? Our Approach Challenges and Lessons Learned
  • 4. Analytics at Fannie Mae Market Risk Financials New Ideas Pricing Measuring Progress Analytics drive the business at Fannie Mae….
  • 5. Why migrate analytics to the Cloud?
  • 6. Opportunities to empower new (and old) capabilities!
  • 7. Analytics and reporting tools will continue to propagate AI Libraries Data Federation & Virtualization Data Integration & ETL Analytics LOB Applications Self-serve Analytics & Visualization Data Catalogs & Metadata Data Science Platforms Self-serve Data Prep Data & Compute Platforms Traditional BI Rather than fight the changes and limit choice we need a platform that enables choice and manages the complexity
  • 8. Opportunities to drive efficiency and sharing… Active Analytic Catalog 3rd Party Cloud On-Prem 1. Connect to data tables 2. Join, massage, aggregate, or shape the data 3. Create calculations, derivations, expressions, aggregations Data Science Tools 1. Connect to data tables 2. Join, massage, aggregate, or shape the data 3. Create calculations, derivations, expressions, aggregations BI Tools 1. Connect to data tables 2. Join, massage, aggregate, or shape the data 3. Create calculations, derivations, expressions, aggregations Line of Bus. Tools ∞ 2. Join, massage, aggregate, or shape the data 3. Create calculations, derivations, expressions, aggregations 4. Use tool-specific functions: send campaign, view model, etc. 4. Use tool-specific functions: send campaign, view model, etc. 4. Use tool-specific functions: send campaign, view model, etc. 4. Use tool-specific functions: send campaign, view model, etc. New/Custom Application 1x 1x 1x ∞ Re-Use Analytic Reuse 1. Connect to data tables
  • 10. Fannie Mae’s experience with Data Lakes 2014 Open source Hadoop 2015 Analytics Cluster using proprietary Hadoop distribution 2016 Data Lake using proprietary Hadoop distribution 2017 Data Lake using cloud native technologies 2018 Driving Data Lake adoption Fannie Mae has been in forefront in adopting to cloud industry advancements
  • 11. Approach #1: Take a Governance View Enterprise Data Lake BI Reports & Dashboards Ad-hoc and what if Queries Data as Service Data Science Results ? Business Transaction Data 3rd Party Data Reference Data Deal and Delivery Documents Structured/SemiStructuredUnstructured Data life-cycle Metadata Data Security Data Lineage App User Enter- prise Data Zones Data Usage Data Standards Access Control Platform Utilization Focus areas to automate or enable tools to manage data lake. Data Certification Compliance Requirements Preparation & Transformatio n What goes in? Ingested What’s done with it? Processed What goes out? Consumed
  • 12. Approach #2: Think about Personas User Zone Enterprise Data Lake (EDL)Data Scientist / Analyst Data Discovery Data reads or copy from other zones into User Zone Data contained (No outward movement) Developer Data ingestion from external source User data/results (Local Governance) Data Discovery (EDL) Data ingestion from external source. (Provide catalog) Provide NPI Classification of external data Process and Insight Layer Governance (Extended Metadata) Data Reads and movement between zones (Controls and Metadata) Schema Design and Data Catalog External movement/ Disclosures (Controls and Catalog) No NPI EDL RBAC No NPI External Data Data External Data * Not all personas shown Enterprise Zone App Zone Data Layers InsightLanding Prep. Data Layers InsightLanding Prep.
  • 13. Approach #3: How can we bring two worlds together? Traditional BI AI Libraries Data Catalogs & Metadata Data & Compute Platforms Data Federation & Virtualization Data Integration & ETL Analytics LOB Applications Self-Serve Analytics & Visualization Data Science Platforms Self-Serve Data Prep Data collaboration platform (centralized service catalog, federated delivery, lineage maintained) SDLC-drivenTechPlatformsBusinessAnalytics
  • 14. Approach #4: It’s new and evolving, so leverage partners who can think end-to-end Worked with Impetus to establish new patterns for analytics data provisioning Use case involved retirement project and cloud transition Implementation required full production context (real production, real users) Solution included: • One time historical data migration (prem to cloud) • Migration of existing base tables and snapshots • New build for cloud-hosted dimensions, snapshots • New build for ongoing data flows (end-to-end) Establish data extraction and ingestion framework Job orchestration Data transformation and change capture Establish audit framework (operations, controls) Capture reusable utilities and build the library Monitor and report performance for each step 2 1 3 4 5 6
  • 16. Challenges Cloud-adoption and Data Lake development can require manual processes, hand-coding, and reliance on command-line tools Keeping track of your data, its lineage, and making it easy to find Coupling of ingestion and processing drives architecture decisions Operationalizing processes for production and to maintain SLAs Ensuring data is in canonical forms with a shared schema usable by others Coding or filing tickets to perform new ingestion and processing tasks Multiple architectures and technologies used by different teams on different clusters Guaranteeing compliance in a system that is designed for schema-on-read and raw data Sharing infrastructure in a multi-tenant “self service” environment Business awareness buy-in
  • 17. What we have learned Review your development practices holistically • You need new patterns for data movement • Don’t lift and shift! Think Governance First! • Incorporation of new processes into data governance strategy • Focus on sustainable practices that fully envision how the end-to-end together Engage strategic partners where it makes sense Keep engaging your business partners to ensure alignment As the center of gravity of data moves toward the cloud, hybrid strategies will become increasingly important This is a migration that, for seasoned companies, will take time Don’t migrate to the Cloud for tech reasons— engage your business!