SlideShare una empresa de Scribd logo
1 de 19
Data Virtualization:
Breaking Down Data Silos
and Other Data Problems
• Richard Hanks (richard_hanks@byu.edu)
• Roger Tervort (roger_tervort@byu.edu)
BRIGHAM YOUNG UNIVERSITY (BYU)
Data Virtualization
“Business demand for self-service
access to real-time data from multiple
data sources and in varied formats
complicates data management.”
- Gartner “Leveraging Data Virtualization in Modern Data Architectures”, April 5, 2019
Data Virtualization
Distributed Data
Management
Technology is based on the execution
of distributed data management
Flexibility
Consumed by applications,
query/reporting tools, message-oriented
middleware or other data management
infrastructure components.
AbstractionLayer
Layer of abstraction above the physical
implementation of data, to simplify
querying logic.
Multiple Data Sources
Used primarily for queries against
multiple heterogeneous data sources,
and federation of query results into
virtual views
Virtual Integrated
Views
Data virtualization can be used to
create virtualized and integrated views
of data. (In-memory, rather than
executing data movement)
Gartner: Market Guide for Data Virtualization (16 Nov 2018);
Leveraging Data Virtualization in Modern Data Architectures (5 Apr 2019)
Data A
Cleaning
Formatting
Joining
Standardizing
Pivoting
Automating
Storing
Field Types
Data B
Report A
Excel
CSV
Oracle
Manual Processes
Report B
Extracting
Wrangling
Aggregating
Calculations
Data C
Data D
Report A (version 2)
ETL
API
Finding
MySQL
If we have the data… Why is it so hard to develop
the report that I want??
Poll How prevalent are data silos at your school?
1 = We have nearly all data centralized
3 = We have some central data
5 = Most of our data is in data silos
Academic Freedom Culture?
Data Silos?
Resulting Problems:
• Lack of Centralized Data
• Disparate Systems
• Data Replication
• Broken Data Pipelines
• Overuse of ETL – just to move data
• Data Security (Authentication /
Authorization)
Selection
of
DV
Tools
www.dremio.com
www.denodo.com
Benefits
from
Data
Virtualization
OIT Managed Data
Enrollment Services
Department of Continuing Education
Center for Teaching and Learning
Library
Marriott School of Management
Single
Point
of
Entry
/
Autorization
Tableau
Business Objects
Excel
PowerBI
Python
R
SQL
Other
Benefits
from
Data
Virtualization
OIT Managed Data
Enrollment Services
Department of Continuing Education
Center for Teaching and Learning
Library
Marriott School of Management
Tableau
Business Objects
Excel
PowerBI
Python
R
SQL
Other
Single
Point
of
Entry
/
Autorization
Benefits
from
Data
Virtualization
Single
Point
of
Entry
/
Autorization
Tableau
Business Objects
Excel
PowerBI
Python
R
SQL
Other
SIS
Identity
Registration
Student
Dimension
Excel/CSV Data
Virtual Views
(Curated)
Database
Physical Layer
Other
Virtual Layer
- Collibra (DSA)
- Searchable (Dremio
Catalog)
01 Reduction in
ETL Development
02 Reduction in
Data Replication
03 Flexible Data
Pipeline for Data
Science and Adhoc
04 Quicker DSA
Approval and Delivery
05 Reduction in Large
Tableau Data Refreshes
06 Breakdown of Data
Silos / Departments
Still have their data
07 Row / Column / Masking
Data Security
08 Addition of CSV, JSON,
and some XLSX files
09 Combining Data Sources
(Oracle, MS SQL,
MySQL, AWS, Mongo)
10 Curated Data Sets
(General and Surgical)
11 Acceleration of Queries
(Caching of Data)
12 Pre-Aggregation Queries
(Cube type OLAP)
Case Studies – Real Life Examples
Library and Enrollment Services (ES) each need data the
other group has. Library data stores library and patron usage
data in MySQL, MongoDB, and Oracle. ES has student
demographic data in Oracle (currently centralized and
managed by IT). Both will need Data Sharing Agreements
(DSA) and will need the data updated frequently.
Need
Extract Mongo DB data to flat file. Build ETL to combine data
from MySQL, Oracle (Library), and Oracle (ES). Data is joined
on common business keys. Estimated time to delivery: 3-4+
weeks (not including DSA)
Old
Leave data in its place. Use Data Virtualization to create
Virtual Data Sources in SQL to query all sources and combine
and join data. Change authorization for new Virtual Data Sets.
Estimated time to delivery 2 days to 1 week (not including
DSA)
New
General Studies needs an analysis of the order of courses
taken to meet the Language of Learning requirement. Course
data is available, but sequencing and analysis will be done in
SAS. Output will be a csv file, but will need to be enriched with
demographic data of students who took specific classes.
Need
Extract Course data into CSV. Analyze data in SAS. Export
SAS result file to CSV. Load CSV into Oracle. Enrich SAS
result data with other Oracle data. Use Tableau to deliver
dashboard of results.
Old
Way
Leave data in its place. Use Dremio to feed data into SAS via
ODBC. Output results stored to NAS drive as csv.
Demographic data added Virtual Data Set. Tableau points to
Dremio data set. Dremio becomes a Data Science Sand Box.
New
Way
Large campus department wanted to do a turnover analysis on
their administrative and student employees. Need 5+ years of
data. No standard analysis process exists. Data in PS Oracle
and will be combined with Department Internal job descriptions
and classifications. HR Department concerned about
additional data in tables with the data that would be “coming
along for the ride.”
Need
Use ETL to create custom table or custom extract into
Department databases. Department will perform its analysis in
MatLab. But how to update?
Old
Leave data in its place. Use SQL in Dremio to query all
sources and combine and join data. MatLab to use ODBC to
query Virtual Data Set for analysis. BONUS: DSA was based
on Virtual Data Set rather than on multiple underlying Oracle
tables. No data came along for the ride. Time to delivery 2
weeks (including DSA)
New
Our Security Operations Center recently expanded their
coverage to include additional Church related academic
institutions. One service they were offering was Threat and
Federated Intelligence. To provide that same service to all
campuses with multiple heterogeneous systems will be a
challenge.
Need
Use RunDeck ETL, python, and other to put automation.
Automation from each system and from those to S3. Somehow
combine enriching data in Oracle with S3 (another ETL?)
Old
Way
Develop an event driven, microservices architecture.
Microservices pulls data from systems and saves to JSON file
in AWS S3. Dremio makes each JSON file appear like a table
and is joined with other tables in Dremio to enrich the data.
Data can feed reporting or other ad-hoc analysis
New
Way
Questions
richard_hanks@byu.edu roger_tervort@byu.edu

Más contenido relacionado

Similar a HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx

Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lakesambiswal
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessJawaherAlbaddawi
 
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdf
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdffinal-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdf
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdfXIAOZEJIN1
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 
CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101ThienSi Le
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLijscai
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLIJSCAI Journal
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLijscai
 

Similar a HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx (20)

Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Enterprise Data Lake
Enterprise Data LakeEnterprise Data Lake
Enterprise Data Lake
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Manage your Datasets
Manage your DatasetsManage your Datasets
Manage your Datasets
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
BI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business businessBI Chapter 03.pdf business business business business business business
BI Chapter 03.pdf business business business business business business
 
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdf
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdffinal-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdf
final-the-data-teams-guide-to-the-db-lakehouse-platform-rd-6-14-22.pdf
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
CS828 P5 Individual Project v101
CS828 P5 Individual Project v101CS828 P5 Individual Project v101
CS828 P5 Individual Project v101
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 

Último

CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSpanmisemningshen123
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...NadhimTaha
 
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwaitdaisycvs
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfwill854175
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
 
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book nowkapoorjyoti4444
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizharallensay1
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTSkajalroy875762
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Availablepr788182
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Adnet Communications
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Falcon Invoice Discounting
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecZurliaSoop
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannaBusinessPlans
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1kcpayne
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
 

Último (20)

CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...joint cost.pptx  COST ACCOUNTING  Sixteenth Edition                          ...
joint cost.pptx COST ACCOUNTING Sixteenth Edition ...
 
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 

HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx

  • 1. Data Virtualization: Breaking Down Data Silos and Other Data Problems • Richard Hanks (richard_hanks@byu.edu) • Roger Tervort (roger_tervort@byu.edu)
  • 3. Data Virtualization “Business demand for self-service access to real-time data from multiple data sources and in varied formats complicates data management.” - Gartner “Leveraging Data Virtualization in Modern Data Architectures”, April 5, 2019
  • 4. Data Virtualization Distributed Data Management Technology is based on the execution of distributed data management Flexibility Consumed by applications, query/reporting tools, message-oriented middleware or other data management infrastructure components. AbstractionLayer Layer of abstraction above the physical implementation of data, to simplify querying logic. Multiple Data Sources Used primarily for queries against multiple heterogeneous data sources, and federation of query results into virtual views Virtual Integrated Views Data virtualization can be used to create virtualized and integrated views of data. (In-memory, rather than executing data movement) Gartner: Market Guide for Data Virtualization (16 Nov 2018); Leveraging Data Virtualization in Modern Data Architectures (5 Apr 2019)
  • 5. Data A Cleaning Formatting Joining Standardizing Pivoting Automating Storing Field Types Data B Report A Excel CSV Oracle Manual Processes Report B Extracting Wrangling Aggregating Calculations Data C Data D Report A (version 2) ETL API Finding MySQL If we have the data… Why is it so hard to develop the report that I want??
  • 6. Poll How prevalent are data silos at your school? 1 = We have nearly all data centralized 3 = We have some central data 5 = Most of our data is in data silos
  • 7. Academic Freedom Culture? Data Silos? Resulting Problems: • Lack of Centralized Data • Disparate Systems • Data Replication • Broken Data Pipelines • Overuse of ETL – just to move data • Data Security (Authentication / Authorization)
  • 9. Benefits from Data Virtualization OIT Managed Data Enrollment Services Department of Continuing Education Center for Teaching and Learning Library Marriott School of Management Single Point of Entry / Autorization Tableau Business Objects Excel PowerBI Python R SQL Other
  • 10. Benefits from Data Virtualization OIT Managed Data Enrollment Services Department of Continuing Education Center for Teaching and Learning Library Marriott School of Management Tableau Business Objects Excel PowerBI Python R SQL Other Single Point of Entry / Autorization
  • 12. 01 Reduction in ETL Development 02 Reduction in Data Replication 03 Flexible Data Pipeline for Data Science and Adhoc 04 Quicker DSA Approval and Delivery 05 Reduction in Large Tableau Data Refreshes 06 Breakdown of Data Silos / Departments Still have their data
  • 13. 07 Row / Column / Masking Data Security 08 Addition of CSV, JSON, and some XLSX files 09 Combining Data Sources (Oracle, MS SQL, MySQL, AWS, Mongo) 10 Curated Data Sets (General and Surgical) 11 Acceleration of Queries (Caching of Data) 12 Pre-Aggregation Queries (Cube type OLAP)
  • 14. Case Studies – Real Life Examples
  • 15. Library and Enrollment Services (ES) each need data the other group has. Library data stores library and patron usage data in MySQL, MongoDB, and Oracle. ES has student demographic data in Oracle (currently centralized and managed by IT). Both will need Data Sharing Agreements (DSA) and will need the data updated frequently. Need Extract Mongo DB data to flat file. Build ETL to combine data from MySQL, Oracle (Library), and Oracle (ES). Data is joined on common business keys. Estimated time to delivery: 3-4+ weeks (not including DSA) Old Leave data in its place. Use Data Virtualization to create Virtual Data Sources in SQL to query all sources and combine and join data. Change authorization for new Virtual Data Sets. Estimated time to delivery 2 days to 1 week (not including DSA) New
  • 16. General Studies needs an analysis of the order of courses taken to meet the Language of Learning requirement. Course data is available, but sequencing and analysis will be done in SAS. Output will be a csv file, but will need to be enriched with demographic data of students who took specific classes. Need Extract Course data into CSV. Analyze data in SAS. Export SAS result file to CSV. Load CSV into Oracle. Enrich SAS result data with other Oracle data. Use Tableau to deliver dashboard of results. Old Way Leave data in its place. Use Dremio to feed data into SAS via ODBC. Output results stored to NAS drive as csv. Demographic data added Virtual Data Set. Tableau points to Dremio data set. Dremio becomes a Data Science Sand Box. New Way
  • 17. Large campus department wanted to do a turnover analysis on their administrative and student employees. Need 5+ years of data. No standard analysis process exists. Data in PS Oracle and will be combined with Department Internal job descriptions and classifications. HR Department concerned about additional data in tables with the data that would be “coming along for the ride.” Need Use ETL to create custom table or custom extract into Department databases. Department will perform its analysis in MatLab. But how to update? Old Leave data in its place. Use SQL in Dremio to query all sources and combine and join data. MatLab to use ODBC to query Virtual Data Set for analysis. BONUS: DSA was based on Virtual Data Set rather than on multiple underlying Oracle tables. No data came along for the ride. Time to delivery 2 weeks (including DSA) New
  • 18. Our Security Operations Center recently expanded their coverage to include additional Church related academic institutions. One service they were offering was Threat and Federated Intelligence. To provide that same service to all campuses with multiple heterogeneous systems will be a challenge. Need Use RunDeck ETL, python, and other to put automation. Automation from each system and from those to S3. Somehow combine enriching data in Oracle with S3 (another ETL?) Old Way Develop an event driven, microservices architecture. Microservices pulls data from systems and saves to JSON file in AWS S3. Dremio makes each JSON file appear like a table and is joined with other tables in Dremio to enrich the data. Data can feed reporting or other ad-hoc analysis New Way

Notas del editor

  1. Single point of entry – Authentication / Authorization of Data (Row, Column, Masking) Flexible Tool (ODBC and Direct Connections) Leave Data at the Source (Less Data Replication)
  2. 4. Easier access to data across campus (via DSA with no data coming along for the ride) 5. Breakdown of existing silos 6. Use flat files as data sources (csv, Excel, JSON) 7. Virtual Data Warehouse (Enterprise View) Dimensions: Student, Faculty, Admin, Date, OU Structure, HR Structure, Colleges, Courses Measures: GPA, Enrollments, Counts, Averages, Hours 8. Curated Data – Data Sets that make sense (some individualized) – THIS is where we really help a lot of the Have Nots. 9. Work with Data Stewards to make “pre-approved” data sets 10. Ability to search data (auto cataloging and tagging) 11. Reflections – data and aggregation query acceleration
  3. 12. Rapid Prototyping – data proof of concept (avoid extensive ETL) 13. Queries across multiple database platforms, on-prem and cloud