Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
An Introduction to Data Virtualization
in Business Intelligence

David M Walker
Data Management & Warehousing
http://datam...
What Is Data Virtualization?
•  Wikipedia:
“Data virtualization is [..] an application to retrieve
and manipulate data wit...
Basic Model
End$Users$access$
via$a$Repor0ng$
Tools$

ETL$treats$$
DV$plaWorm$$
as$a$source$

Data$Publishing$
Batch/RESTf...
Advanced Features:
Role Based Access Control & Data Masking
User$1$

User$2$

First&Name&

Last&Name&

DoB&

Salary&

Firs...
Advanced Features:
Caching
User$sees$performance$as$if$all$the$data$was$local$

Data$Virtualiza0on$PlaWorm$
$$
$
Cached$Co...
Advanced Features:
Creating a Canonical Data Model
User$sees$system$as$a$single$CDM$and$not$mul0ple$sources$
Data$Virtuali...
But it’s not a Silver Bullet
•  Can be slow
–  Depending on how much data has to be fetched from remote
systems to the DV ...
BI Use Cases:
Agile Data Mart Design
•  Access data
warehouse data
quickly and easily
•  Design the data mart
you think yo...
BI Use Case:
Virtual Data Marts
•  Big Tin Appliance with
lots of horse power?
•  Don’t want to duplicate
data in the appl...
BI Use Case:
Data Mart Extensions
•  Existing (physical) data
mart
•  New Data source that
needs to be
incorporated quickl...
BI Use Case:
Agile Set Based ELT Design
•  If your normal ETL style
is a series of set SQL
queries built on top of
each ot...
BI Use Case:
Big Data Integration
•  DV Platform
connects to Big Data
Sources
•  Data Sources are
mapped into DV
•  User a...
BI Use Case:
Source System Analysis
•  Apply your data quality
and data profiling tools
to all your data sources
•  Look f...
BI Use Case:
Data Masking
•  Currently building two
versions of a data
mart, one with
sensitive data in and
one without
• ...
BI Use Cases
•  Some examples
–  Usefulness of each example depends on the
organization

•  Generally an enabler for more ...
Vendors: What The Analysts Say
•  Forrester Wave Data
Virtualization Q1 2012

•  Forrester Wave Q1/12
–  Informatica
–  IB...
Vendors: Product Positioning
Stand Alone
•  Players
–  Cisco (Composite)
–  Denodo

•  Selection
–  Popular where IBM/
Inf...
An Introduction to Data Virtualization
in Business Intelligence

David M Walker
Data Management & Warehousing
http://datam...
Próxima SlideShare
Cargando en…5
×

An introduction to data virtualization in business intelligence

A brief description of what Data Virtualisation is and how it can be used to support business intelligence applications and development. Originally presented to the ETIS Conference in Riga, Latvia in October 2013

An introduction to data virtualization in business intelligence

  1. 1. An Introduction to Data Virtualization in Business Intelligence David M Walker Data Management & Warehousing http://datamgmt.com 18 OKTOBRIS 2013
  2. 2. What Is Data Virtualization? •  Wikipedia: “Data virtualization is [..] an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.” •  Or more simply: A solution that sits in front of multiple data sources and allows them to be treated as a single SQL database
  3. 3. Basic Model End$Users$access$ via$a$Repor0ng$ Tools$ ETL$treats$$ DV$plaWorm$$ as$a$source$ Data$Publishing$ Batch/RESTful$ Message$Based$ SOA/Publica0on$ Data$Virtualiza0on$PlaWorm$ Defines$a$‘model’$of$the$source$systems$(similar$in$concept$to$a$BO$Universe)$ Models$can$generally$be$layered$on$top$of$other$models$$$ •  Tradi0onal$Databases$ •  •  •  •  •  •  IBM$(DB2$&$Netezza)$ Microso@$(SQL$Server)$ Oracle$(Oracle$&$MySQL)$ Postgres$ Sybase$(ASE$&$IQ)$ Etc.$ •  NoSQL$/$NewSQL$ •  •  •  •  •  Apache$Hadoop$ Cassandra$ Mongo$ Neo4J$ etc.$ •  Other$Formats$ •  •  •  •  •  •  •  •  Microso@$Office$ Messaging$ Flat$Files$ XML$ Web$ Cloud$ Applica0on$APIs$ etc.$
  4. 4. Advanced Features: Role Based Access Control & Data Masking User$1$ User$2$ First&Name& Last&Name& DoB& Salary& First&Name& Last&Name& Age& Joe$ Bloggs$ 30^Jan^1983$ NULL$ Joe$ Bloggs$ 30$ Jane$ Smith$ 17^Jun^1978$ NULL$ Jane$ Smith$ 35$ Role$Based$ Authen0ca0on$ Data$Virtualiza0on$PlaWorm:$ Manages$sensi0ve$informa0on$based$on$a$users$role$ First&Name& Last&Name& DoB& Salary& Joe$ Bloggs$ 30^Jan^1983$ €60,100$ Jane$ Smith$ 17^Jun^1978$ €75,400$
  5. 5. Advanced Features: Caching User$sees$performance$as$if$all$the$data$was$local$ Data$Virtualiza0on$PlaWorm$ $$ $ Cached$Copy$of$$ Remote$Database$Table$ Local$Database$Table$$ with$good$connec0vity$$ Remote$Database$Table$ with$poor$connec0vity$$
  6. 6. Advanced Features: Creating a Canonical Data Model User$sees$system$as$a$single$CDM$and$not$mul0ple$sources$ Data$Virtualiza0on$PlaWorm$ $$ $ Data$mapped$to$ conform$to$a$$$ Canonical$Model$ Finance$System$ Other$Systems$ CRM$System$ Billing$System$ Website$
  7. 7. But it’s not a Silver Bullet •  Can be slow –  Depending on how much data has to be fetched from remote systems to the DV platform – platforms try to be smart to reduce this •  Can impact performance on underlying systems –  Lots of BI users making queries on resource sensitive OLTP systems is not a good idea •  Requires Resources –  Another set of servers, technologies, etc. to manage, but this cost is often offset against the reduction in complexity elsewhere. •  Not a replacement – it is an additional tool –  You will still need ETL and Messaging
  8. 8. BI Use Cases: Agile Data Mart Design •  Access data warehouse data quickly and easily •  Design the data mart you think you want •  Test it with real data and your actual reporting tool •  Also possible with data warehouse design Data$Virtualiza0on$PlaWorm$ A$ OR$ Data$Warehouse$ B$
  9. 9. BI Use Case: Virtual Data Marts •  Big Tin Appliance with lots of horse power? •  Don’t want to duplicate data in the appliance and consume disk space for a data mart but want the star schema for ease of use? Data$Virtualiza0on$PlaWorm$ Data$Warehouse$
  10. 10. BI Use Case: Data Mart Extensions •  Existing (physical) data mart •  New Data source that needs to be incorporated quickly •  Create virtual copy of existing data mart and data source •  Integrate into updated data mart design Data$ Virtualiza0on$ PlaWorm$ Data$Mart$ New$Data$ Source$ $
  11. 11. BI Use Case: Agile Set Based ELT Design •  If your normal ETL style is a series of set SQL queries built on top of each other then you can quickly prototype ETL before moving it into your normal ETL engine to persist execute (normally for performance) Data$Virtualiza0on$PlaWorm$ Source$ Source$ Source$
  12. 12. BI Use Case: Big Data Integration •  DV Platform connects to Big Data Sources •  Data Sources are mapped into DV •  User accesses them via standard tools (SQL, RESTful interfaces, etc.) SQL$based$tools$ SQL$Interface$ Data$Virtualiza0on$PlaWorm$ Map$Reduce,$etc.$Interface$
  13. 13. BI Use Case: Source System Analysis •  Apply your data quality and data profiling tools to all your data sources •  Look for relationships across systems •  Remove limitations of accessibility by enabling caching so that you are not hitting the source system but have fresh data Data$Quality$&$Profiling$Tools$ Data$Virtualiza0on$PlaWorm$ Source$ Source$ Source$
  14. 14. BI Use Case: Data Masking •  Currently building two versions of a data mart, one with sensitive data in and one without •  Instead build one and use Role Based Access Control (RBAC) to restrict what an individual can see Data$Virtualiza0on$PlaWorm$ AND$ Physical$Data$Mart$
  15. 15. BI Use Cases •  Some examples –  Usefulness of each example depends on the organization •  Generally an enabler for more agility –  Quicker prototyping and integration •  Will not solve all your problems –  And has a cost associated with it (license & hardware
  16. 16. Vendors: What The Analysts Say •  Forrester Wave Data Virtualization Q1 2012 •  Forrester Wave Q1/12 –  Informatica –  IBM –  Denodo •  EU (Spanish) Origins –  Composite •  Now part of Cisco •  Was OEM’d by Informatica –  Microsoft –  SAP –  And others •  Gartner –  No Magic Quadrant, instead includes Data Virtualization in Data Integration
  17. 17. Vendors: Product Positioning Stand Alone •  Players –  Cisco (Composite) –  Denodo •  Selection –  Popular where IBM/ Informatica are not already embedded Integrated •  Players –  IBM –  Informatica •  Selection –  Popular with organisations that already have the vendor ETL tool
  18. 18. An Introduction to Data Virtualization in Business Intelligence David M Walker Data Management & Warehousing http://datamgmt.com THANK YOU - PALDIES

×