SlideShare una empresa de Scribd logo
1 de 44
Physical Database Design for
MPP and Columnar Databases
Geoffrey Clark
Principal at Lucidata, Inc.
September 2013
copywrite, Lucidata, 2013
Conceptual, Logical, Physical
• Conceptual links to Business Strategy.
– This is now becoming more quantitative
• Logical maps to the Business Semantics.
– Con-way example
• Physical maps to your Data Stores
– These will be more varied and heterogeneous in
the future, due to specialization.
copywrite, Lucidata, 2013
HBR Business Strategy
The New Dynamics of Competition, Michael D. Ryall, Harvard Business Review, June 2013
Michael Porter’s Five Forces
has dominated strategic
and competitive analysis
since 1979. This analysis
has largely been conceptual
in nature.
Quantitative analysis on
structured data in context is
changing the nature of
business culture, and
improving business
decisions.
This drives the demand for
data modeling and
management.
copywrite, Lucidata, 2013
Design and Evolution
• Hierarchies
– 14th Century Europe and the Financial Revolution
– Aggregations & Allocations
• Cards, Tapes – physical analog media
• Computer Science
– Moore’s Law
• Processor Speed Improvements
• Memory Improvements
• Media Improvements – Punch Cards, Tape, Disk, Memory
• Design for Context & the Future
– Character encoding - Internationalization
– Calendars – Gregorian, Fiscal, Lunar, ... Y2K?
• Files and Fields
– Separation of Data and Metadata
– Modern versions -> XML, JSON
• Joins!
– Data Sets – Super types, Sub types
– Associations describe Networks!
copywrite, Lucidata, 2013
Technology’s Improvement Pace
copywrite, Lucidata, 2013
... and Demand Forecast
copywrite, Lucidata, 2013
Separation of Church and State
• Operational uses
– Capture the data, hand-entered <- validation
– A Data Flow, such as Order to Cash cycle
– Con-way example of PRO(-gressive) numbers
• Analytical uses
– Desire for reports, Reporting crashes the
Operational cycle, Cash flow problem.
– Banished from OLTP, go make an ODS
copywrite, Lucidata, 2013
The Star Schema
The purpose of business computers is to sort data. A graphical
representation of sorted data is called a ‘Star Schema’.
– Michael Silves, Principal at Datamorphosis
• The right design at the right time, becomes default doctrine for DW
– Early RDBMS (Relational Data Base Management Systems)
• Low memory, slow disks, slow CPU
• Big Demand, with questions that spanned the datasets
• Performance issues over large datasets
– Interview Business people to get questions
• Pre-process the data, based on business questions
– Separation into Dimensions and Facts/Metrics
• Link to Business Semantics
• OLAP (On-Line Analytical Processing)
• Educate Users on Aggregation and Allocation
• Conformed Dimensions across Departments to give an Enterprise-wide view of the data.
• But as technology changes, problems emerge
– Ad-hoc questions require redesign & rework
– With business hierarchies when one concept is both a fact & dimension, e.g. Shipment
– Fact tables become difficult to distribute for MPP ... e.g. Teradata prefers a normalized DW
• Example – transportation networks
copywrite, Lucidata, 2013
Example – Multi-Modal Freight
• Shipments are agreements between a Carrier and a
Shipper to move goods between two places.
• Shipments can be split into “ProFreight” (which is
assigned a cost via activity-based costing).
• Shipments/ProFreight are composed of Freight
handling units.
• Freight can be “re-tendered” to another carrier, in
which case is is linked to the original and the new
Shipment.
• Freight moves between places on one or many “VFCs”
or Containers.
• Containers are moved between places on Trips.
copywrite, Lucidata, 2013
Kimball on Transportation, 3NF
copywrite, Lucidata, 2013
Kimball on Transportation, Star
copywrite, Lucidata, 2013
Table Level DW diagram
copywrite, Lucidata, 2013
Dim Modeling Dogma
• “Our carefully normalized data model can not
be translated into a star schema... “
– Dimensional modeling is necessary in order to
generate correct queries
– Any (normalized) data model can be transformed
in a dimensional model...
– ... and there exists an algorithm to do it
copywrite, Lucidata, 2013
Dim Modeling Example
copywrite, Lucidata, 2013
Star option considered
copywrite, Lucidata, 2013
Bridge table
(remember, we tried this)
We tried this with
hesmith When
selecting a main
hierarchy is has
too much of a
downside, and
you don’t have a
weight factor …
copywrite, Lucidata, 2013
Multi-fact option considered
copywrite, Lucidata, 2013
Oracle’s Algorithmic approach
copywrite, Lucidata, 2013
Basic DW diagram
copywrite, Lucidata, 2013
Build Dimensional Model in BI
copywrite, Lucidata, 2013
Freight moves through Networks
copywrite, Lucidata, 2013
Information Factory & MPP
• Normalized Base
– Integrate data once
• Source -> Normalized -> Denormalized -> OK
• Source -> Denormalized? -> Un-normalized -> ?
– Detect problems and fix them once!
• Does not preclude Data Marts
• Massive Parallel Processing
– Data distribution
• Optimizations – Broadcast, Co-location, Re-distribution
• Scalability, the quest for 1:1
• Normalized data - reduced IO, better match for
copywrite, Lucidata, 2013
Bob Conway’s Rapid Methodology
copywrite, Lucidata, 2013
Core Model with many Roles
Transaction
Tables
Reference Tables
copywrite, Lucidata, 2013
Power of Conformed Dimensions
copywrite, Lucidata, 2013
Example Data Model & Hierarchy
copywrite, Lucidata, 2013
Data Flow and Usage
copywrite, Lucidata, 2013
Cubes and In-memory BI
• Multi-Dimensional OLAP (MOLAP)
– Drag-and-Drop OLAP environment, analysts
become capable of self-service.
– Dealt with Ragged Hierarchies, common in
Financial data such as General Ledger (GL)
– Limited by memory size
– Pressure for more dimensionality floods cube size,
build times from relational sources exceed load
windows ...
• Relational OLAP (ROLAP)
copywrite, Lucidata, 2013
But a network this size choked it
copywrite, Lucidata, 2013
Columnar vs Row-wise
• Physically store data by Column vs Row
– Rather like Fifth Normal Form.
– If Semantically Organized, then Rapid Response to
user’s ad-hoc aggregation requests.
– Prefers batch loading, always loads once per
column, even if loading one row.
• Continues to Appear and Operate as a normal
Row-wise cousin.
copywrite, Lucidata, 2013
Columnar IO example
Compression becomes
much more effective
Reading a Column is
like reading a Row
copywrite, Lucidata, 2013
Design Pattern for Log Data
Data Stewards for
Master Data
Data Stewards for
Metadata
Architects
integrate data
and metadata
Architects
organize data for
analysis with
physical in mind
Architects identify levels for
analysis, and distributionColumnar
MPP
copywrite, Lucidata, 2013
Importance of Reference Data
copywrite, Lucidata, 2013
Infobright’s Database Landscape 2011
copywrite, Lucidata, 2013
Analytic Database Comparison
Actian
ParAccel
IBM
Netezza
HP
Vertica
Green
plum
Tera
data
Sybase
IQ
copywrite, Lucidata, 2013
Gartner’s Magic Quadrant
copywrite, Lucidata, 2013
Hadoop (Cloudera & Hortonworks)
“Although it’s true that Hadoop can be valuable as an analytic silo, most
organizations will prefer to get the most business value out of Hadoop by
integrating it with—or into—their BI, DW, DI, and analytics technology
stacks.” – Philip Russom TDWI
http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspx
copywrite, Lucidata, 2013
Hadoop for Analytics?
Analytics performs
best on Structured
Data, for good
reasons.
Maintain MPP strengths in
the solution through
Architecture.
copywrite, Lucidata, 2013
Message from Hortonworks (Hadoop)
“Although it’s true that Hadoop can be valuable as an analytic silo, most
organizations will prefer to get the most business value out of Hadoop by
integrating it with—or into—their BI, DW, DI, and analytics technology
stacks.” – Philip Russom TDWI
http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspxcopywrite, Lucidata, 2013
Hadoop as ETL
copywrite, Lucidata, 2013
Data Flow Reference Architecture
copywrite, Lucidata, 2013
Message from Neo4J NoSQL
copywrite, Lucidata, 2013
Message from MongoDB (NoSQL)
http://www.slideshare.net/fullscreen/mongodb/schema-design-by-example/1copywrite, Lucidata, 2013
Message from Couchbase (NoSQL)
http://www.couchbase.com/why-nosql/nosql-databasecopywrite, Lucidata, 2013

Más contenido relacionado

La actualidad más candente

Bi Dw Presentation
Bi Dw PresentationBi Dw Presentation
Bi Dw Presentationvickyc
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0Daniel Westzaan
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Steve Keil
 
BI architecture presentation and involved models (short)
BI architecture presentation and involved models (short)BI architecture presentation and involved models (short)
BI architecture presentation and involved models (short)Thierry de Spirlet
 
Optimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and ServicesOptimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and ServicesIBM India Smarter Computing
 
7 - Enterprise IT in Action
7 - Enterprise IT in Action7 - Enterprise IT in Action
7 - Enterprise IT in ActionRaymond Gao
 
Austin fraser sap hana presentation
Austin fraser sap hana presentationAustin fraser sap hana presentation
Austin fraser sap hana presentationShane Sale
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?James Serra
 
SAP HANA Integrated with Microstrategy
SAP HANA Integrated with MicrostrategySAP HANA Integrated with Microstrategy
SAP HANA Integrated with Microstrategysnehal parikh
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Bikramjit Sarkar, Ph.D.
 
Keynote Sap UA Conference March 23 a zeier final
Keynote Sap UA Conference March 23 a zeier  finalKeynote Sap UA Conference March 23 a zeier  final
Keynote Sap UA Conference March 23 a zeier finalProf. Dr. Alexander Zeier
 

La actualidad más candente (20)

Bi Dw Presentation
Bi Dw PresentationBi Dw Presentation
Bi Dw Presentation
 
Mr bi
Mr biMr bi
Mr bi
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!
 
Column Oriented Databases
Column Oriented DatabasesColumn Oriented Databases
Column Oriented Databases
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
A hadoop map reduce
A hadoop map reduceA hadoop map reduce
A hadoop map reduce
 
BI architecture presentation and involved models (short)
BI architecture presentation and involved models (short)BI architecture presentation and involved models (short)
BI architecture presentation and involved models (short)
 
Optimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and ServicesOptimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and Services
 
7 - Enterprise IT in Action
7 - Enterprise IT in Action7 - Enterprise IT in Action
7 - Enterprise IT in Action
 
Austin fraser sap hana presentation
Austin fraser sap hana presentationAustin fraser sap hana presentation
Austin fraser sap hana presentation
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?
 
SAP HANA Integrated with Microstrategy
SAP HANA Integrated with MicrostrategySAP HANA Integrated with Microstrategy
SAP HANA Integrated with Microstrategy
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Keynote Sap UA Conference March 23 a zeier final
Keynote Sap UA Conference March 23 a zeier  finalKeynote Sap UA Conference March 23 a zeier  final
Keynote Sap UA Conference March 23 a zeier final
 
Resume Pallavi Mishra as of 2017 Feb
Resume Pallavi Mishra as of 2017 FebResume Pallavi Mishra as of 2017 Feb
Resume Pallavi Mishra as of 2017 Feb
 

Similar a Data modelingzone geoffrey-clark-v2

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...DataStax Academy
 
2009/11 Database Architechs Presentation
2009/11   Database Architechs Presentation2009/11   Database Architechs Presentation
2009/11 Database Architechs PresentationDatabase Architechs
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?DATAVERSITY
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010ERwin Modeling
 
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
Integrating Semantic Web with the Real World  - A Journey between Two Cities ...Integrating Semantic Web with the Real World  - A Journey between Two Cities ...
Integrating Semantic Web with the Real World - A Journey between Two Cities ...Juan Sequeda
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architecturesRaji Gogulapati
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Roland Bullivant
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vjhomeworkping4
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolutionmark madsen
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 

Similar a Data modelingzone geoffrey-clark-v2 (20)

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
 
2009/11 Database Architechs Presentation
2009/11   Database Architechs Presentation2009/11   Database Architechs Presentation
2009/11 Database Architechs Presentation
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010
 
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
Integrating Semantic Web with the Real World  - A Journey between Two Cities ...Integrating Semantic Web with the Real World  - A Journey between Two Cities ...
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 

Último

Moroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptxMoroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptxOmarOuazzani1
 
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)Escort Service
 
Sicily Holidays Guide Book: Unveiling the Treasures of Italy's Jewel
Sicily Holidays Guide Book: Unveiling the Treasures of Italy's JewelSicily Holidays Guide Book: Unveiling the Treasures of Italy's Jewel
Sicily Holidays Guide Book: Unveiling the Treasures of Italy's JewelTime for Sicily
 
Where to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasdWhere to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasdusmanghaniwixpatriot
 
question 2: airplane vocabulary presentation
question 2: airplane vocabulary presentationquestion 2: airplane vocabulary presentation
question 2: airplane vocabulary presentationcaminantesdaauga
 
How Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s WatersHow Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s WatersMakena Coast Charters
 
a presentation for foreigners about how to travel in Germany.
a presentation for foreigners about how to travel in Germany.a presentation for foreigners about how to travel in Germany.
a presentation for foreigners about how to travel in Germany.moritzmieg
 
Revolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI UpdateRevolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI Updatejoymorrison10
 
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsxHoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsxChung Yen Chang
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)Mazie Garcia
 
Authentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptxAuthentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptxGregory DeShields
 
Haitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptxHaitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptxhxhlixia
 
Aeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change PolicyAeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change PolicyFlyFairTravels
 
Italia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue muraItalia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue murasandamichaela *
 
Inspirational Quotes About Italy and Food
Inspirational Quotes About Italy and FoodInspirational Quotes About Italy and Food
Inspirational Quotes About Italy and FoodKasia Chojecki
 

Último (17)

Moroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptxMoroccan Architecture presentation ( Omar & Yasine ).pptx
Moroccan Architecture presentation ( Omar & Yasine ).pptx
 
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
69 Girls ✠ 9599264170 ✠ Call Girls In East Of Kailash (VIP)
 
Sicily Holidays Guide Book: Unveiling the Treasures of Italy's Jewel
Sicily Holidays Guide Book: Unveiling the Treasures of Italy's JewelSicily Holidays Guide Book: Unveiling the Treasures of Italy's Jewel
Sicily Holidays Guide Book: Unveiling the Treasures of Italy's Jewel
 
Where to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasdWhere to Stay in Lagos, Portugal.pptxasd
Where to Stay in Lagos, Portugal.pptxasd
 
question 2: airplane vocabulary presentation
question 2: airplane vocabulary presentationquestion 2: airplane vocabulary presentation
question 2: airplane vocabulary presentation
 
How Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s WatersHow Safe Is It To Witness Whales In Maui’s Waters
How Safe Is It To Witness Whales In Maui’s Waters
 
a presentation for foreigners about how to travel in Germany.
a presentation for foreigners about how to travel in Germany.a presentation for foreigners about how to travel in Germany.
a presentation for foreigners about how to travel in Germany.
 
Enjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 62 Noida Escorts Delhi NCR
 
Revolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI UpdateRevolutionalizing Travel: A VacAI Update
Revolutionalizing Travel: A VacAI Update
 
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsxHoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
Hoi An Ancient Town, Vietnam (越南 會安古鎮).ppsx
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
 
Authentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptxAuthentic Travel Experience 2024 Greg DeShields.pptx
Authentic Travel Experience 2024 Greg DeShields.pptx
 
Haitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptxHaitian culture and stuff and places and food and travel.pptx
Haitian culture and stuff and places and food and travel.pptx
 
Aeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change PolicyAeromexico Airlines Flight Name Change Policy
Aeromexico Airlines Flight Name Change Policy
 
Italia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue muraItalia Lucca 1 Un tesoro nascosto tra le sue mura
Italia Lucca 1 Un tesoro nascosto tra le sue mura
 
Inspirational Quotes About Italy and Food
Inspirational Quotes About Italy and FoodInspirational Quotes About Italy and Food
Inspirational Quotes About Italy and Food
 
Enjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 74 Noida Escorts Delhi NCR
 

Data modelingzone geoffrey-clark-v2

  • 1. Physical Database Design for MPP and Columnar Databases Geoffrey Clark Principal at Lucidata, Inc. September 2013 copywrite, Lucidata, 2013
  • 2. Conceptual, Logical, Physical • Conceptual links to Business Strategy. – This is now becoming more quantitative • Logical maps to the Business Semantics. – Con-way example • Physical maps to your Data Stores – These will be more varied and heterogeneous in the future, due to specialization. copywrite, Lucidata, 2013
  • 3. HBR Business Strategy The New Dynamics of Competition, Michael D. Ryall, Harvard Business Review, June 2013 Michael Porter’s Five Forces has dominated strategic and competitive analysis since 1979. This analysis has largely been conceptual in nature. Quantitative analysis on structured data in context is changing the nature of business culture, and improving business decisions. This drives the demand for data modeling and management. copywrite, Lucidata, 2013
  • 4. Design and Evolution • Hierarchies – 14th Century Europe and the Financial Revolution – Aggregations & Allocations • Cards, Tapes – physical analog media • Computer Science – Moore’s Law • Processor Speed Improvements • Memory Improvements • Media Improvements – Punch Cards, Tape, Disk, Memory • Design for Context & the Future – Character encoding - Internationalization – Calendars – Gregorian, Fiscal, Lunar, ... Y2K? • Files and Fields – Separation of Data and Metadata – Modern versions -> XML, JSON • Joins! – Data Sets – Super types, Sub types – Associations describe Networks! copywrite, Lucidata, 2013
  • 6. ... and Demand Forecast copywrite, Lucidata, 2013
  • 7. Separation of Church and State • Operational uses – Capture the data, hand-entered <- validation – A Data Flow, such as Order to Cash cycle – Con-way example of PRO(-gressive) numbers • Analytical uses – Desire for reports, Reporting crashes the Operational cycle, Cash flow problem. – Banished from OLTP, go make an ODS copywrite, Lucidata, 2013
  • 8. The Star Schema The purpose of business computers is to sort data. A graphical representation of sorted data is called a ‘Star Schema’. – Michael Silves, Principal at Datamorphosis • The right design at the right time, becomes default doctrine for DW – Early RDBMS (Relational Data Base Management Systems) • Low memory, slow disks, slow CPU • Big Demand, with questions that spanned the datasets • Performance issues over large datasets – Interview Business people to get questions • Pre-process the data, based on business questions – Separation into Dimensions and Facts/Metrics • Link to Business Semantics • OLAP (On-Line Analytical Processing) • Educate Users on Aggregation and Allocation • Conformed Dimensions across Departments to give an Enterprise-wide view of the data. • But as technology changes, problems emerge – Ad-hoc questions require redesign & rework – With business hierarchies when one concept is both a fact & dimension, e.g. Shipment – Fact tables become difficult to distribute for MPP ... e.g. Teradata prefers a normalized DW • Example – transportation networks copywrite, Lucidata, 2013
  • 9. Example – Multi-Modal Freight • Shipments are agreements between a Carrier and a Shipper to move goods between two places. • Shipments can be split into “ProFreight” (which is assigned a cost via activity-based costing). • Shipments/ProFreight are composed of Freight handling units. • Freight can be “re-tendered” to another carrier, in which case is is linked to the original and the new Shipment. • Freight moves between places on one or many “VFCs” or Containers. • Containers are moved between places on Trips. copywrite, Lucidata, 2013
  • 10. Kimball on Transportation, 3NF copywrite, Lucidata, 2013
  • 11. Kimball on Transportation, Star copywrite, Lucidata, 2013
  • 12. Table Level DW diagram copywrite, Lucidata, 2013
  • 13. Dim Modeling Dogma • “Our carefully normalized data model can not be translated into a star schema... “ – Dimensional modeling is necessary in order to generate correct queries – Any (normalized) data model can be transformed in a dimensional model... – ... and there exists an algorithm to do it copywrite, Lucidata, 2013
  • 16. Bridge table (remember, we tried this) We tried this with hesmith When selecting a main hierarchy is has too much of a downside, and you don’t have a weight factor … copywrite, Lucidata, 2013
  • 19. Basic DW diagram copywrite, Lucidata, 2013
  • 20. Build Dimensional Model in BI copywrite, Lucidata, 2013
  • 21. Freight moves through Networks copywrite, Lucidata, 2013
  • 22. Information Factory & MPP • Normalized Base – Integrate data once • Source -> Normalized -> Denormalized -> OK • Source -> Denormalized? -> Un-normalized -> ? – Detect problems and fix them once! • Does not preclude Data Marts • Massive Parallel Processing – Data distribution • Optimizations – Broadcast, Co-location, Re-distribution • Scalability, the quest for 1:1 • Normalized data - reduced IO, better match for copywrite, Lucidata, 2013
  • 23. Bob Conway’s Rapid Methodology copywrite, Lucidata, 2013
  • 24. Core Model with many Roles Transaction Tables Reference Tables copywrite, Lucidata, 2013
  • 25. Power of Conformed Dimensions copywrite, Lucidata, 2013
  • 26. Example Data Model & Hierarchy copywrite, Lucidata, 2013
  • 27. Data Flow and Usage copywrite, Lucidata, 2013
  • 28. Cubes and In-memory BI • Multi-Dimensional OLAP (MOLAP) – Drag-and-Drop OLAP environment, analysts become capable of self-service. – Dealt with Ragged Hierarchies, common in Financial data such as General Ledger (GL) – Limited by memory size – Pressure for more dimensionality floods cube size, build times from relational sources exceed load windows ... • Relational OLAP (ROLAP) copywrite, Lucidata, 2013
  • 29. But a network this size choked it copywrite, Lucidata, 2013
  • 30. Columnar vs Row-wise • Physically store data by Column vs Row – Rather like Fifth Normal Form. – If Semantically Organized, then Rapid Response to user’s ad-hoc aggregation requests. – Prefers batch loading, always loads once per column, even if loading one row. • Continues to Appear and Operate as a normal Row-wise cousin. copywrite, Lucidata, 2013
  • 31. Columnar IO example Compression becomes much more effective Reading a Column is like reading a Row copywrite, Lucidata, 2013
  • 32. Design Pattern for Log Data Data Stewards for Master Data Data Stewards for Metadata Architects integrate data and metadata Architects organize data for analysis with physical in mind Architects identify levels for analysis, and distributionColumnar MPP copywrite, Lucidata, 2013
  • 33. Importance of Reference Data copywrite, Lucidata, 2013
  • 34. Infobright’s Database Landscape 2011 copywrite, Lucidata, 2013
  • 37. Hadoop (Cloudera & Hortonworks) “Although it’s true that Hadoop can be valuable as an analytic silo, most organizations will prefer to get the most business value out of Hadoop by integrating it with—or into—their BI, DW, DI, and analytics technology stacks.” – Philip Russom TDWI http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspx copywrite, Lucidata, 2013
  • 38. Hadoop for Analytics? Analytics performs best on Structured Data, for good reasons. Maintain MPP strengths in the solution through Architecture. copywrite, Lucidata, 2013
  • 39. Message from Hortonworks (Hadoop) “Although it’s true that Hadoop can be valuable as an analytic silo, most organizations will prefer to get the most business value out of Hadoop by integrating it with—or into—their BI, DW, DI, and analytics technology stacks.” – Philip Russom TDWI http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspxcopywrite, Lucidata, 2013
  • 40. Hadoop as ETL copywrite, Lucidata, 2013
  • 41. Data Flow Reference Architecture copywrite, Lucidata, 2013
  • 42. Message from Neo4J NoSQL copywrite, Lucidata, 2013
  • 43. Message from MongoDB (NoSQL) http://www.slideshare.net/fullscreen/mongodb/schema-design-by-example/1copywrite, Lucidata, 2013
  • 44. Message from Couchbase (NoSQL) http://www.couchbase.com/why-nosql/nosql-databasecopywrite, Lucidata, 2013

Notas del editor

  1. Jeff Kibler @ Infobright