SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
12th
July 2017 BICOD'2017@London.United Kingdom 1
Taming Size and Cardinality of OLAP Data
Cubes over Big Data
Alfredo Cuzzocrea University of Trieste & ICAR
Rim Moussa LaTICE Lab. & University of Carthage
Achref Labidi LaTICE Lab. & University of Carthage
The 31st British International Conference on Databases
@ London, United Kingdom
12th
of July, 2017
12th
July 2017 BICOD'2017@London.United Kingdom 2
Outline
Data Warehouse Systems
DWS Architectures
OLAP cube
DSS Benchmarks
TPC-H*d: a Multi-dimensional Database Benchmark
TPC-H*d
AutoMDB
Application Scenarios of TPC-H*d
Benchmarking Data Servers
Benchmarking Multidimensional DB Schemas
Benchmarking Parallel OLAP Servers
Conclusion & Research Agenda
12th
July 2017 BICOD'2017@London.United Kingdom 3
Data Warehouses Architectures: Lazy Data Integration
 Query-driven Architecture
Relational
Data
Source
WRAPPER WRAPPER
MEDIATOR
WRAPPERWRAPPER
12th
July 2017 BICOD'2017@London.United Kingdom 4
Data Warehouses Architectures: Eager Data Integration
 Warehouse System Architecture
Data WarehouseRelational
Data
Source
Integration
workflows of the
Integration
System
12th
July 2017 BICOD'2017@London.United Kingdom
Facts: are the objects that represent the subject of the desired analyses.
»Examples: sales records, weather records, cabs trips, …
»The fact table contained 3 types of attributes: measured attributes, foreign keys
to dimension tables, degenerate dimensions
Dimension(s):
»Levels are individual values that make up dimensions
»Examples
»Date dimension (Trimester, month, day)
»Time dimension (hour, min, sec)
»Geography dimension (Country, city, postal code)
Measure(s):
»Examples: revenue, lost revenue, sold quantities, expenses, …
»Use aggregate functions: min, max, count, distinct-count, sum, average, …
5
Data Warehousing
--OLAP Cube
12th
July 2017 BICOD'2017@London.United Kingdom 6
OLAP Operations
Q10 of TPC-H benchmark
Customer
nation
details
order date
year
quarter
line return flag
12th
July 2017 BICOD'2017@London.United Kingdom 7
OLAP Operations
On-Line Analytical Processing (Q10 of TPC-H benchmark)
12th
July 2017 BICOD'2017@London.United Kingdom
Structured Query Language (SQL)
»Relational and static schema
»Data Definition, data Manipulation, and Data Control Language
»Analytic Functions (window functions over partition by …)
»Cube, roll-up and grouping sets operators
MultiDimensional eXpressions (MDX)
»Invented by Microsoft in 1997
»For querying and manipulating the multidimensional data stored in OLAP cubes
»Static schema
Data Flow programming language
»Google Sawzall, Apache Pig Latin, IBM Infosphere Streams
»Dynamic schema
»After data is loaded, multiple operators are applied on that data before the final
output is stored.
8
Query Languages
Load Data
Apply
Schema
Apply Filter Group Data
Apply
Aggregate
Function
Sort Data
Store
Output
12th
July 2017 BICOD'2017@London.United Kingdom 9
Query Languages
SQL – Q16 of TPC-H benchmark
12th
July 2017 BICOD'2017@London.United Kingdom 10
Query Languages
MDX –Q16 of TPC-H Benchmark
WITH SET [Brands] AS 'Except({[Part Brand].Members}, {[Part
Brand].[Brand#45 ]})'
SET [Types] AS 'Filter({[Part Type].Members}, (NOT ([Part
Type].CurrentMember.Name MATCHES "(?i)MEDIUM POLISHED.*")))'
SET [Sizes] AS 'Filter({[Part Size].Members}, ([Part Size].CurrentMember IN
{[Part Size].[3], [Part Size].[9], [Part Size].[14], [Part Size].[19], [Part Size].[23],
[Part Size].[36], [Part Size].[45], [Part Size].[49]}))'
SELECT [Measures].[Supplier Count] ON COLUMNS,
nonemptyCrossjoin(nonemptyCrossjoin([Brands], [Types]), [Sizes]) ON ROWS
FROM [Cube16]
12th
July 2017 BICOD'2017@London.United Kingdom 11
Query Languages
Data Flow –Pig Latin script for Q16 of TPC-H benchmark
12th
July 2017 BICOD'2017@London.United Kingdom 12
Decision Support Systems Benchmarks
Non-TPC Benchmarks
Real datasets
»Open data or proprietary data
»fixed size
»Devise a workload or trace the proprietary workload
APB-1: no scale factor
TPC Benchmarks
The Transaction Processing Council founded in 1988 to define
benchmarks
In 2009, TPC-TC is set up as an International Technology
Conference Series on Performance Evaluation and Benchmarking
Examples of benchmarks relevant for benchmarking decision support
systems: TPC-H, TPC-DS and TPC-DI
Common characteristics of TPC benchmarks
»Synthetic data
»Scale factor allowing generation of different volumes 1GB to 1PB
12th
July 2017 BICOD'2017@London.United Kingdom 13
Decision Support Systems Benchmarks
TPC-H Benchmark Schema (1/2)
TPC-H Benchmark
22 ad-hoc SQL statements (star queries, nested queries, …) + refresh functions
12th
July 2017 BICOD'2017@London.United Kingdom 14
Decision Support Systems Benchmarks
TPC-H Benchmark (2/2)
TPC-H Benchmark 2 Metrics
»QphH@Size is the number of queries processed per hour, that the system
under test can handle for a fixed load
»$/QphH@Size represents the ratio of cost to performance, where the cost is
the cost of ownership of the SUT (hardware,software, maintenance).
Variants of TPC-H Benchmarks
TPC-H*d Benchmark [Cuzzocrea and Moussa, 2013]
»Turning TPC-H benchmark into a Multi-dimensional benchmark
»Few schema changes
»Same TPC-H workload
»2 MDX workloads: query workload cube-then-query workload
SSB: Star Schema Benchmark [O’Neil et al., 2012]
»Turning TPC-H benchmark into star-schema
»Workload composed of 12 queries
TPC-H translated into Pig Latin (Apache Hadoop Ecosystem) [Moussa,2012]
»22 pig latin scripts which load and process TPC-H raw data files (.tbl files)
12th
July 2017 BICOD'2017@London.United Kingdom 15
Decision Support Systems Benchmarks
TPC-DS Benchmark (1/2)
TPC-DS Benchmark: 7 data marts
12th
July 2017 BICOD'2017@London.United Kingdom 16
Decision Support Systems Benchmarks
TPC-DS Benchmark (2/2)
TPC-DS Benchmark Workload
Hundred of queries (99 query templates)
OLAP, windowing functions, mining, and reporting queries
ACID and Concurrent data maintenance (not ACID in TPC-DS 2.x)
TPC-DS Benchmark Metrics
2 main Metrics
»QphDS@Size is the number of queries processed per hour, that the
system under test can handle for a fixed load.
»Data Maintenance and Load Time are calculated
»$/QphDS@Size represents the ratio of cost to performance, where the
cost is a 3 year cost of ownership of the SUT (hardware,software,
maintenance)
TPC-DS implementations
TPC-DS v2.0
»Extension for non-relational systems such as Hadoop/Spark big data
systems
12th
July 2017 BICOD'2017@London.United Kingdom 17
Outline
Introduction
Part I: Data Warehouses
Part II: Muti-dimensional Database Design
TPC-H*d
AutoMDB
Part III: Application Scenarios
Conclusion
12th
July 2017 BICOD'2017@London.United Kingdom
Given,
A relational Warehouse schema
A Workload -a set of OLAP business queries,
W = {Q1, Q2, …, Qn}
where Qi is a parameterized query
How to design the Multi-dimensional DB Schema?
How to define cubes?
Will there be a single cube or multiple cubes?
Are there any rules for merging of cubes?
Which Optimizations are suitable for performance tuning ?
Derived data calculus & refresh? (materialized views, derived attributes,
indexes,…)
Data partitioning & parallel cube building?
# 18
MDB Design Problem
12th
July 2017 BICOD'2017@London.United Kingdom # 19
Idea
Map each business query to an OLAP cube
>> Obtain a multi-dimensional DB schema
Recommend & Test Optimizations
>> Derived Data
>> Data partitioning
>> Cube Merging
12th
July 2017 BICOD'2017@London.United Kingdom # 20
TPC-H*d
Q8: From SQL statement to OLAP cube
12th
July 2017 BICOD'2017@London.United Kingdom # 21
TPC-H*d
TPC-H*d OLAP Cube C8
Market Share for each supplier nation within a region of customers,
for each year and each part type
12th
July 2017 BICOD'2017@London.United Kingdom # 22
TPC-H*d
TPC-H*d OLAP Query Q8
Market Share for each RUSSIAN Suppliers within AMERICA region,
Over the years 1995 and 1996 and for part type ECO. ANODIZED STEEL
12th
July 2017 BICOD'2017@London.United Kingdom
Open source software implemented in java
Parses MDB schemas (.xml) files using SAX Library
Performs comparisons of OLAP cubes' characteristics.
»For each pair of OLAP cubes,
»show whether they have same fact table or not
»compute the nbr of shared | different | coalescable dimensions
»Dimensions are coalescable if they are extracted from the dimension table
and their hierarchies are coalescable
»compute the number of shared | different measures
»Run merge of OLAP cubes using different similarity functions
»Simple distance function have or not same fact table
»K-means clustering
»Distance function is computed with weights to cube characteristics
»Propose Virtual Cubes
»Auto-generate a new MDB Schema (.xml)
»Create MDB Schema from TPC-DS SQL Workload
»On-going
# 23
AutoMDB
12th
July 2017 BICOD'2017@London.United Kingdom # 24
AutoMDB
Load OLAP Cubes defined in xml file
12th
July 2017 BICOD'2017@London.United Kingdom # 25
AutoMDB
Compare OLAP Cubes –have or not same fact table
12th
July 2017 BICOD'2017@London.United Kingdom # 26
AutoMDB
Compare Cubes –Group cubes which have same fact table
12th
July 2017 BICOD'2017@London.United Kingdom # 27
AutoMDB
Compare Cubes –Auto-generate a new MDB schema
12th
July 2017 BICOD'2017@London.United Kingdom 28
Outline
Introduction
Part I: Data warehousing
Part II: Multidimensional DB Design
Part III: Application Scenarios
Benchmarking Data Servers
Benchmarking Multidimensional DB Schemas
Benchmarking Parallel OLAP servers
Conclusion and Research agenda
12th
July 2017 BICOD'2017@London.United Kingdom 29
Benchmarking Data Servers
--Column-oriented storage systems vs row-oriented storage systems
Columnar Storage Systems
»High IO performance: less data moving from hard drives to memory
»Efficient Memory Management: load only required data into memory
»Reduced Storage: columns with low cardinality are compressed
»Efficient Schema Modifying Techniques: adding new columns will not
induce a file storage re-organization
Types
»Binary Association Tables
»Each column is stored in a separate (surrogate key, value) table
»RDBMS: MonetDB
»Family of columns
»Design techniques are based on measuring the affinity between
attributes through the count of their co-occurrence in the query
workload and clustering attributes
»Vertical partitioning for DB design
12th
July 2017 BICOD'2017@London.United Kingdom 30
Benchmarking Data Servers
--Column-storage systems vs row-based storage systems
MySQL MonetDB
C1 2,778 sec 30 sec
C10 Java heap space Error 758 sec
C11 2,558 sec 2,536 sec
C3 Mondrian Error: Size of cross join exceeded limit
12th
July 2017 BICOD'2017@London.United Kingdom 31
Benchmarking Middleware for Parallel Cube Processing
--OLAP & High Performance Computing
Systems which scale-out through Data Fragmentation and Load Balancing
achieve
»Parallel IO
»Parallel Processing
Technologies
»Parallel Cube processing OLAP servers
»Distributed Relational Data Warehouses + Mid-tier for parallel cube
processing
»Hadoop Systems
»SQL-on-Hadoop Systems
»e.g. Hive, Spark SQL, Drill, Impala, IBM BigInsights, …
12th
July 2017 BICOD'2017@London.United Kingdom 32
Benchmarking Middleware for Parallel Cube Processing
--OLAP* framework Key Considerations for Data Fragmentation
Reduce the Size of Each Cube to be Built at Each Node
»big-cardinality dimensions' partitioning
Simplify Post-Processing of OLAP Cubes
»Cubes which have disjoint dimensions’ members have simple post-
processing (union all operation), while the merge of all dimensions'
hierarchies is costly
Enhance Data Maintenance
»DW refresh processing
»Distributed Maintenance Transaction processing
Controlled Replication
»Replication has refresh and storage cost
»Replication optimizes join operations through dimension table
replication
12th
July 2017 BICOD'2017@London.United Kingdom 33
Benchmarking Middleware for Parallel Cube Processing
--Performance Measurements with MySQL as DB backend
MySQL 4 MySQL instances DB
C1 2,778 sec 862 sec
C10 Java heap space Error 13,774 sec
12th
July 2017 BICOD'2017@London.United Kingdom 34
Benchmarking MDB Schemas
MDB Design
»Simple approach: Map for each query a required cube(s)
»Sophisticated approach
»Analyze OLAP workload
»Find out shared facts, dimensions and measures
»Define new cubes based on cubes clustering
»Re-write the workload
12th
July 2017 BICOD'2017@London.United Kingdom 35
Benchmarking MDB Schemas
--TPC-H*d Example
_Same fact table
_2 shared dimension tables
but different hierarchies
_1 different dimension
_Same measure
12th
July 2017 BICOD'2017@London.United Kingdom 36
Benchmarking MDB Schemas
--TPC-H*d Example
Initial schema Virtual Cubes
C_5_7 - 3,457 sec
C5 3,200 sec 0.7 sec
C7 617 sec 0.2 sec
12th
July 2017 BICOD'2017@London.United Kingdom 37
Conclusion and Future Work
Performance Leaks
Mondrian can not build an OLAP cube having more than
2,147,483,647 cells
OLAP cube 20 has 200,052,100,026 cells
Experiments
TPC-H with SF=10
RDBMS: MonetDB and MySQL
Tuning: materialized views and derived attributes
Were run on Suno nodes (@Sophia Grid5000 HPC platform)
Each node has 32GB of RAM
Mondrian requires more RAM
XML description of Cubes of TPC-H and TPC-DS cubes allows us to sketch,
recommend and assess
vertical partitioning techniques for DB design (Family of columns)
materialized views
indexes
12th
July 2017 BICOD'2017@London.United Kingdom 38
Future Work
Intelligent Recommenders for the selection of Indexes and
Materialized Views
 Indexes and physical structures that can significantly accelerate
performance
XML description of each cube allows us to recommend
Recommenders for performance tuning
»AutoAdmin research project at Microsoft, which explores techniques
to make databases self-tuning [Agrawal et al., 2000]
»Alerter Approach [Hose et al., 2008]: support the aggregate
configuration of an OLAP server by (1) continuously monitoring
information about the workload and the benefit of aggregation tables
and (2) alerting the DBA if changes to the current configuration would
be beneficial
»Semi-Automatic Index Tuning: keeping DBAs in the loop [Schnaiter and
Polyzotis, 2012] Online workload analysis with decisions delegated to
the DBA. The solution takes into account index interactions
12th
July 2017 BICOD'2017@London.United Kingdom 39
Research in Data Warehouse Modeling?
DOLAP Workshop 2006
IBM White paper 2015
12th
July 2017 BICOD'2017@London.United Kingdom 40
References (1/3)
 M. Fricke, The Knowledge Pyramid: A Critique of the DIKW Hierarchy. Journal of Information
Science. 2009.
 E.F. Codd, S.B. Codd and C.T. Salley, Providing OLAP to User Analysts: an IT mandate, 1993.
 J. Widom, Integrating Heterogeneous databases: eager or lazy? ACM Computing Surveys (CSUR)
Vol.4, 1996
 Y.R. Cho, Data Warehouse and OLAP Operations www.ecs.baylor.edu/faculty/cho/4352
 TPC homepage http://www.tpc.org/
 M. Poess, T. Rabl and B. Caufield: TPC-DI: The First Industry Benchmark for Data
Integration. PVLDB 7(13): 1367-1378 (2014)
http://www.vldb.org/pvldb/vol7/p1367-poess.pdf
 X. Li, J. Han, H. Gonzalez: High-Dimensional OLAP: A Minimal Cubing Approach. VLDB 2004.
 C. Imhoff, N. Galemmo, J. G. Geiger. Mastering Data Warehouse Design: Relational and
Dimensional Techniques. 2003.
 R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, B. Becker. The Data Warehouse
Lifecycle Toolkit. 2nd Edition.
 R. Kimball, M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional
Modeling. 2nd Edition.
 H. G. Molia. Data Warehousing Overview: Issues, Terminology, Products.
www.cs.uh.edu/~ceick/6340/dw-olap.ppt (slides)
12th
July 2017 BICOD'2017@London.United Kingdom # 41
References (2/3)
Modeling Multidimensional Databases (non exhaustive list)
M. Gyssens and L. V.S. Lakshmanan. A Foundation for Multi-Dimensional Databases.
VLDB’1997.
R. Agrawal, A. Gupta and S. Sarawagi. Modeling Multidimensional Databases.
ICDE’1997.
J. Gray, A. Bosworth, A. Layman and H. Priahesh. Data Cube: A Relational Aggregation
Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. ICDE’2008.
P. Vassiliadis. Modeling Multidimensional Databases, Cubes and Cube Operations.
SSDBM’1998.
L. Cabibbo and R. Torlone. A Logical Approach to Multidimensional Databases.
EDBT’1998.
D. Cheung, B. Zhou, B. Kao, H. Lu, T. Lam and H. Ting. Requirement-based data cube
schema design. CIKM’1999.
T. Niemi, J. Nummenmaa and P. Thanisch. Constructing OLAP cubes based on Queries.
DOLAP’2001.
O. Teste. Towards Conceptual Multidimensional Design in Decision Support Systems.
DEXA’2010.
A. Cuzzocrea and R. Moussa. Multidimensional Database Design
via Schema Transformation: Turning TPC-H into the TPC-
H*d Multidimensional Benchmark. COMAD’2013.
12th
July 2017 BICOD'2017@London.United Kingdom 42
References (3/3)
Introduction
Part I: Methods & State-of-the-Art
Part II: Experiences
Part III: Challenging Problems
Conclusion
M. Fowler, Schemaless data structures. 2013 http://martinfowler.com/articles/schemaless/
N. Marz and J. Warren, Big Data: Principles and best practices of scalable realtime data
systems, 1st Edition
S. Agrawal, S. Chaudhuri and V. Narasayya Automated Selection of Materialized Views and
Indexes for SQL Databases. VLDB’2000
http://www.research.microsoft.com/dmx/AutoAdmin
K. Hose, D. Klan, M. Marx and K. Sattler. When is it Time to Rethink the Aggregate
Configuration of Your OLAP Server?. VLDB’2008
Karl Schnaitter and Neoklis Polyzotis. Semi-Automatic Index Tuning: Keeping DBAs in the
Loop. VLDB’2012
P. Zhao, X. Li, D. Xin and J. Han.
Graph cube: on warehousing and OLAP multidimensional networks. SIGMOD’2011
L. D. Lins, J. T. Klosowski and C. E. Scheidegger:
Nanocubes for Real-Time Exploration of Spatiotemporal Datasets. IEEE Trans. Vis. Comput.
Graph. 2013 https://github.com/laurolins/nanocube
12th
July 2017 BICOD'2017@London.United Kingdom 43
Thank you for your Attention
Q & A
Taming Size and Cardinality of OLAP Data Cubes over Big
Data
Alfredo Cuzzocrea, Rim Moussa and Achref Labidi
12th of July, 2017
12th
July 2017 BICOD'2017@London.United Kingdom 44
Decision Support Systems Benchmarks
TPC-DI Benchmark (1/3)
[Poess et al. 2014]
For benchmarking Data Integration technologies
Synthetic Data of a Factious Retail Brokerage Firm
»Internal Trading system data, Internal Human resources data, Internal
CRM System and External data
»Different data scales
»Data extracted from different sources:
»Structured (csv)
»Semi-structured data (xml)
»Multi record (nested data)
»Change Data Capture (CDS)
18 Complex Data Integration Tasks
Load large volumes of historical data
Load incremental updates
Execute complex transformations
Check and ensure consistency of data
12th
July 2017 BICOD'2017@London.United Kingdom # 45
TPC-H*d
Truly OLAP variant of TPC-H benchmark
TPC-H SQL workload translated into MDX (MultiDimensional
eXpressions)
The workload is composed of 23 MDX statements for OLAP
cubes and 23 MDX statements for OLAP business queries.
Each business question of TPC-H benchmark is mapped to an OLAP
cube

Más contenido relacionado

La actualidad más candente

Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchRim Moussa
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
 
Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarksTilmann Rabl
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTilmann Rabl
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentationDavid Rice
 
Data Bases - Introduction to data science
Data Bases - Introduction to data scienceData Bases - Introduction to data science
Data Bases - Introduction to data scienceFrank Kienle
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesEUDAT
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...BigData_Europe
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 

La actualidad más candente (20)

parallel OLAP
parallel OLAPparallel OLAP
parallel OLAP
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarks
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data Integration
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
Data Bases - Introduction to data science
Data Bases - Introduction to data scienceData Bases - Introduction to data science
Data Bases - Introduction to data science
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
rasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubesrasdaman: from barebone Arrays to DataCubes
rasdaman: from barebone Arrays to DataCubes
 
Improved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the MassesImproved Methods for Accessing Scientific Data for the Masses
Improved Methods for Accessing Scientific Data for the Masses
 
Improved Map reduce Framework using High Utility Transactional Databases
Improved Map reduce Framework using High Utility  Transactional DatabasesImproved Map reduce Framework using High Utility  Transactional Databases
Improved Map reduce Framework using High Utility Transactional Databases
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
 
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 

Similar a BICOD-2017

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarMartin Hamilton
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysKoneksys
 
Fabric And Storage Management
Fabric And Storage ManagementFabric And Storage Management
Fabric And Storage ManagementFNian
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...Big Data Value Association
 
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 DataBench
 
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)Robert Grossman
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Edwin Poot
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Rim Moussa
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise ArchitectsNeo4j
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018] ASML's MDE Going SiriusObeo
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ IndixRajesh Muppalla
 
Final_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfFinal_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfMongoDB
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...Domino Data Lab
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
 

Similar a BICOD-2017 (20)

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC Seminar
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Data Integration Solutions Created By Koneksys
Data Integration Solutions Created By KoneksysData Integration Solutions Created By Koneksys
Data Integration Solutions Created By Koneksys
 
Fabric And Storage Management
Fabric And Storage ManagementFabric And Storage Management
Fabric And Storage Management
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
 
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius[Sirius Day Eindhoven 2018]  ASML's MDE Going Sirius
[Sirius Day Eindhoven 2018] ASML's MDE Going Sirius
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Final_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfFinal_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdf
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
 

Último

Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 

Último (20)

Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

BICOD-2017

  • 1. 12th July 2017 BICOD'2017@London.United Kingdom 1 Taming Size and Cardinality of OLAP Data Cubes over Big Data Alfredo Cuzzocrea University of Trieste & ICAR Rim Moussa LaTICE Lab. & University of Carthage Achref Labidi LaTICE Lab. & University of Carthage The 31st British International Conference on Databases @ London, United Kingdom 12th of July, 2017
  • 2. 12th July 2017 BICOD'2017@London.United Kingdom 2 Outline Data Warehouse Systems DWS Architectures OLAP cube DSS Benchmarks TPC-H*d: a Multi-dimensional Database Benchmark TPC-H*d AutoMDB Application Scenarios of TPC-H*d Benchmarking Data Servers Benchmarking Multidimensional DB Schemas Benchmarking Parallel OLAP Servers Conclusion & Research Agenda
  • 3. 12th July 2017 BICOD'2017@London.United Kingdom 3 Data Warehouses Architectures: Lazy Data Integration  Query-driven Architecture Relational Data Source WRAPPER WRAPPER MEDIATOR WRAPPERWRAPPER
  • 4. 12th July 2017 BICOD'2017@London.United Kingdom 4 Data Warehouses Architectures: Eager Data Integration  Warehouse System Architecture Data WarehouseRelational Data Source Integration workflows of the Integration System
  • 5. 12th July 2017 BICOD'2017@London.United Kingdom Facts: are the objects that represent the subject of the desired analyses. »Examples: sales records, weather records, cabs trips, … »The fact table contained 3 types of attributes: measured attributes, foreign keys to dimension tables, degenerate dimensions Dimension(s): »Levels are individual values that make up dimensions »Examples »Date dimension (Trimester, month, day) »Time dimension (hour, min, sec) »Geography dimension (Country, city, postal code) Measure(s): »Examples: revenue, lost revenue, sold quantities, expenses, … »Use aggregate functions: min, max, count, distinct-count, sum, average, … 5 Data Warehousing --OLAP Cube
  • 6. 12th July 2017 BICOD'2017@London.United Kingdom 6 OLAP Operations Q10 of TPC-H benchmark Customer nation details order date year quarter line return flag
  • 7. 12th July 2017 BICOD'2017@London.United Kingdom 7 OLAP Operations On-Line Analytical Processing (Q10 of TPC-H benchmark)
  • 8. 12th July 2017 BICOD'2017@London.United Kingdom Structured Query Language (SQL) »Relational and static schema »Data Definition, data Manipulation, and Data Control Language »Analytic Functions (window functions over partition by …) »Cube, roll-up and grouping sets operators MultiDimensional eXpressions (MDX) »Invented by Microsoft in 1997 »For querying and manipulating the multidimensional data stored in OLAP cubes »Static schema Data Flow programming language »Google Sawzall, Apache Pig Latin, IBM Infosphere Streams »Dynamic schema »After data is loaded, multiple operators are applied on that data before the final output is stored. 8 Query Languages Load Data Apply Schema Apply Filter Group Data Apply Aggregate Function Sort Data Store Output
  • 9. 12th July 2017 BICOD'2017@London.United Kingdom 9 Query Languages SQL – Q16 of TPC-H benchmark
  • 10. 12th July 2017 BICOD'2017@London.United Kingdom 10 Query Languages MDX –Q16 of TPC-H Benchmark WITH SET [Brands] AS 'Except({[Part Brand].Members}, {[Part Brand].[Brand#45 ]})' SET [Types] AS 'Filter({[Part Type].Members}, (NOT ([Part Type].CurrentMember.Name MATCHES "(?i)MEDIUM POLISHED.*")))' SET [Sizes] AS 'Filter({[Part Size].Members}, ([Part Size].CurrentMember IN {[Part Size].[3], [Part Size].[9], [Part Size].[14], [Part Size].[19], [Part Size].[23], [Part Size].[36], [Part Size].[45], [Part Size].[49]}))' SELECT [Measures].[Supplier Count] ON COLUMNS, nonemptyCrossjoin(nonemptyCrossjoin([Brands], [Types]), [Sizes]) ON ROWS FROM [Cube16]
  • 11. 12th July 2017 BICOD'2017@London.United Kingdom 11 Query Languages Data Flow –Pig Latin script for Q16 of TPC-H benchmark
  • 12. 12th July 2017 BICOD'2017@London.United Kingdom 12 Decision Support Systems Benchmarks Non-TPC Benchmarks Real datasets »Open data or proprietary data »fixed size »Devise a workload or trace the proprietary workload APB-1: no scale factor TPC Benchmarks The Transaction Processing Council founded in 1988 to define benchmarks In 2009, TPC-TC is set up as an International Technology Conference Series on Performance Evaluation and Benchmarking Examples of benchmarks relevant for benchmarking decision support systems: TPC-H, TPC-DS and TPC-DI Common characteristics of TPC benchmarks »Synthetic data »Scale factor allowing generation of different volumes 1GB to 1PB
  • 13. 12th July 2017 BICOD'2017@London.United Kingdom 13 Decision Support Systems Benchmarks TPC-H Benchmark Schema (1/2) TPC-H Benchmark 22 ad-hoc SQL statements (star queries, nested queries, …) + refresh functions
  • 14. 12th July 2017 BICOD'2017@London.United Kingdom 14 Decision Support Systems Benchmarks TPC-H Benchmark (2/2) TPC-H Benchmark 2 Metrics »QphH@Size is the number of queries processed per hour, that the system under test can handle for a fixed load »$/QphH@Size represents the ratio of cost to performance, where the cost is the cost of ownership of the SUT (hardware,software, maintenance). Variants of TPC-H Benchmarks TPC-H*d Benchmark [Cuzzocrea and Moussa, 2013] »Turning TPC-H benchmark into a Multi-dimensional benchmark »Few schema changes »Same TPC-H workload »2 MDX workloads: query workload cube-then-query workload SSB: Star Schema Benchmark [O’Neil et al., 2012] »Turning TPC-H benchmark into star-schema »Workload composed of 12 queries TPC-H translated into Pig Latin (Apache Hadoop Ecosystem) [Moussa,2012] »22 pig latin scripts which load and process TPC-H raw data files (.tbl files)
  • 15. 12th July 2017 BICOD'2017@London.United Kingdom 15 Decision Support Systems Benchmarks TPC-DS Benchmark (1/2) TPC-DS Benchmark: 7 data marts
  • 16. 12th July 2017 BICOD'2017@London.United Kingdom 16 Decision Support Systems Benchmarks TPC-DS Benchmark (2/2) TPC-DS Benchmark Workload Hundred of queries (99 query templates) OLAP, windowing functions, mining, and reporting queries ACID and Concurrent data maintenance (not ACID in TPC-DS 2.x) TPC-DS Benchmark Metrics 2 main Metrics »QphDS@Size is the number of queries processed per hour, that the system under test can handle for a fixed load. »Data Maintenance and Load Time are calculated »$/QphDS@Size represents the ratio of cost to performance, where the cost is a 3 year cost of ownership of the SUT (hardware,software, maintenance) TPC-DS implementations TPC-DS v2.0 »Extension for non-relational systems such as Hadoop/Spark big data systems
  • 17. 12th July 2017 BICOD'2017@London.United Kingdom 17 Outline Introduction Part I: Data Warehouses Part II: Muti-dimensional Database Design TPC-H*d AutoMDB Part III: Application Scenarios Conclusion
  • 18. 12th July 2017 BICOD'2017@London.United Kingdom Given, A relational Warehouse schema A Workload -a set of OLAP business queries, W = {Q1, Q2, …, Qn} where Qi is a parameterized query How to design the Multi-dimensional DB Schema? How to define cubes? Will there be a single cube or multiple cubes? Are there any rules for merging of cubes? Which Optimizations are suitable for performance tuning ? Derived data calculus & refresh? (materialized views, derived attributes, indexes,…) Data partitioning & parallel cube building? # 18 MDB Design Problem
  • 19. 12th July 2017 BICOD'2017@London.United Kingdom # 19 Idea Map each business query to an OLAP cube >> Obtain a multi-dimensional DB schema Recommend & Test Optimizations >> Derived Data >> Data partitioning >> Cube Merging
  • 20. 12th July 2017 BICOD'2017@London.United Kingdom # 20 TPC-H*d Q8: From SQL statement to OLAP cube
  • 21. 12th July 2017 BICOD'2017@London.United Kingdom # 21 TPC-H*d TPC-H*d OLAP Cube C8 Market Share for each supplier nation within a region of customers, for each year and each part type
  • 22. 12th July 2017 BICOD'2017@London.United Kingdom # 22 TPC-H*d TPC-H*d OLAP Query Q8 Market Share for each RUSSIAN Suppliers within AMERICA region, Over the years 1995 and 1996 and for part type ECO. ANODIZED STEEL
  • 23. 12th July 2017 BICOD'2017@London.United Kingdom Open source software implemented in java Parses MDB schemas (.xml) files using SAX Library Performs comparisons of OLAP cubes' characteristics. »For each pair of OLAP cubes, »show whether they have same fact table or not »compute the nbr of shared | different | coalescable dimensions »Dimensions are coalescable if they are extracted from the dimension table and their hierarchies are coalescable »compute the number of shared | different measures »Run merge of OLAP cubes using different similarity functions »Simple distance function have or not same fact table »K-means clustering »Distance function is computed with weights to cube characteristics »Propose Virtual Cubes »Auto-generate a new MDB Schema (.xml) »Create MDB Schema from TPC-DS SQL Workload »On-going # 23 AutoMDB
  • 24. 12th July 2017 BICOD'2017@London.United Kingdom # 24 AutoMDB Load OLAP Cubes defined in xml file
  • 25. 12th July 2017 BICOD'2017@London.United Kingdom # 25 AutoMDB Compare OLAP Cubes –have or not same fact table
  • 26. 12th July 2017 BICOD'2017@London.United Kingdom # 26 AutoMDB Compare Cubes –Group cubes which have same fact table
  • 27. 12th July 2017 BICOD'2017@London.United Kingdom # 27 AutoMDB Compare Cubes –Auto-generate a new MDB schema
  • 28. 12th July 2017 BICOD'2017@London.United Kingdom 28 Outline Introduction Part I: Data warehousing Part II: Multidimensional DB Design Part III: Application Scenarios Benchmarking Data Servers Benchmarking Multidimensional DB Schemas Benchmarking Parallel OLAP servers Conclusion and Research agenda
  • 29. 12th July 2017 BICOD'2017@London.United Kingdom 29 Benchmarking Data Servers --Column-oriented storage systems vs row-oriented storage systems Columnar Storage Systems »High IO performance: less data moving from hard drives to memory »Efficient Memory Management: load only required data into memory »Reduced Storage: columns with low cardinality are compressed »Efficient Schema Modifying Techniques: adding new columns will not induce a file storage re-organization Types »Binary Association Tables »Each column is stored in a separate (surrogate key, value) table »RDBMS: MonetDB »Family of columns »Design techniques are based on measuring the affinity between attributes through the count of their co-occurrence in the query workload and clustering attributes »Vertical partitioning for DB design
  • 30. 12th July 2017 BICOD'2017@London.United Kingdom 30 Benchmarking Data Servers --Column-storage systems vs row-based storage systems MySQL MonetDB C1 2,778 sec 30 sec C10 Java heap space Error 758 sec C11 2,558 sec 2,536 sec C3 Mondrian Error: Size of cross join exceeded limit
  • 31. 12th July 2017 BICOD'2017@London.United Kingdom 31 Benchmarking Middleware for Parallel Cube Processing --OLAP & High Performance Computing Systems which scale-out through Data Fragmentation and Load Balancing achieve »Parallel IO »Parallel Processing Technologies »Parallel Cube processing OLAP servers »Distributed Relational Data Warehouses + Mid-tier for parallel cube processing »Hadoop Systems »SQL-on-Hadoop Systems »e.g. Hive, Spark SQL, Drill, Impala, IBM BigInsights, …
  • 32. 12th July 2017 BICOD'2017@London.United Kingdom 32 Benchmarking Middleware for Parallel Cube Processing --OLAP* framework Key Considerations for Data Fragmentation Reduce the Size of Each Cube to be Built at Each Node »big-cardinality dimensions' partitioning Simplify Post-Processing of OLAP Cubes »Cubes which have disjoint dimensions’ members have simple post- processing (union all operation), while the merge of all dimensions' hierarchies is costly Enhance Data Maintenance »DW refresh processing »Distributed Maintenance Transaction processing Controlled Replication »Replication has refresh and storage cost »Replication optimizes join operations through dimension table replication
  • 33. 12th July 2017 BICOD'2017@London.United Kingdom 33 Benchmarking Middleware for Parallel Cube Processing --Performance Measurements with MySQL as DB backend MySQL 4 MySQL instances DB C1 2,778 sec 862 sec C10 Java heap space Error 13,774 sec
  • 34. 12th July 2017 BICOD'2017@London.United Kingdom 34 Benchmarking MDB Schemas MDB Design »Simple approach: Map for each query a required cube(s) »Sophisticated approach »Analyze OLAP workload »Find out shared facts, dimensions and measures »Define new cubes based on cubes clustering »Re-write the workload
  • 35. 12th July 2017 BICOD'2017@London.United Kingdom 35 Benchmarking MDB Schemas --TPC-H*d Example _Same fact table _2 shared dimension tables but different hierarchies _1 different dimension _Same measure
  • 36. 12th July 2017 BICOD'2017@London.United Kingdom 36 Benchmarking MDB Schemas --TPC-H*d Example Initial schema Virtual Cubes C_5_7 - 3,457 sec C5 3,200 sec 0.7 sec C7 617 sec 0.2 sec
  • 37. 12th July 2017 BICOD'2017@London.United Kingdom 37 Conclusion and Future Work Performance Leaks Mondrian can not build an OLAP cube having more than 2,147,483,647 cells OLAP cube 20 has 200,052,100,026 cells Experiments TPC-H with SF=10 RDBMS: MonetDB and MySQL Tuning: materialized views and derived attributes Were run on Suno nodes (@Sophia Grid5000 HPC platform) Each node has 32GB of RAM Mondrian requires more RAM XML description of Cubes of TPC-H and TPC-DS cubes allows us to sketch, recommend and assess vertical partitioning techniques for DB design (Family of columns) materialized views indexes
  • 38. 12th July 2017 BICOD'2017@London.United Kingdom 38 Future Work Intelligent Recommenders for the selection of Indexes and Materialized Views  Indexes and physical structures that can significantly accelerate performance XML description of each cube allows us to recommend Recommenders for performance tuning »AutoAdmin research project at Microsoft, which explores techniques to make databases self-tuning [Agrawal et al., 2000] »Alerter Approach [Hose et al., 2008]: support the aggregate configuration of an OLAP server by (1) continuously monitoring information about the workload and the benefit of aggregation tables and (2) alerting the DBA if changes to the current configuration would be beneficial »Semi-Automatic Index Tuning: keeping DBAs in the loop [Schnaiter and Polyzotis, 2012] Online workload analysis with decisions delegated to the DBA. The solution takes into account index interactions
  • 39. 12th July 2017 BICOD'2017@London.United Kingdom 39 Research in Data Warehouse Modeling? DOLAP Workshop 2006 IBM White paper 2015
  • 40. 12th July 2017 BICOD'2017@London.United Kingdom 40 References (1/3)  M. Fricke, The Knowledge Pyramid: A Critique of the DIKW Hierarchy. Journal of Information Science. 2009.  E.F. Codd, S.B. Codd and C.T. Salley, Providing OLAP to User Analysts: an IT mandate, 1993.  J. Widom, Integrating Heterogeneous databases: eager or lazy? ACM Computing Surveys (CSUR) Vol.4, 1996  Y.R. Cho, Data Warehouse and OLAP Operations www.ecs.baylor.edu/faculty/cho/4352  TPC homepage http://www.tpc.org/  M. Poess, T. Rabl and B. Caufield: TPC-DI: The First Industry Benchmark for Data Integration. PVLDB 7(13): 1367-1378 (2014) http://www.vldb.org/pvldb/vol7/p1367-poess.pdf  X. Li, J. Han, H. Gonzalez: High-Dimensional OLAP: A Minimal Cubing Approach. VLDB 2004.  C. Imhoff, N. Galemmo, J. G. Geiger. Mastering Data Warehouse Design: Relational and Dimensional Techniques. 2003.  R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, B. Becker. The Data Warehouse Lifecycle Toolkit. 2nd Edition.  R. Kimball, M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. 2nd Edition.  H. G. Molia. Data Warehousing Overview: Issues, Terminology, Products. www.cs.uh.edu/~ceick/6340/dw-olap.ppt (slides)
  • 41. 12th July 2017 BICOD'2017@London.United Kingdom # 41 References (2/3) Modeling Multidimensional Databases (non exhaustive list) M. Gyssens and L. V.S. Lakshmanan. A Foundation for Multi-Dimensional Databases. VLDB’1997. R. Agrawal, A. Gupta and S. Sarawagi. Modeling Multidimensional Databases. ICDE’1997. J. Gray, A. Bosworth, A. Layman and H. Priahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. ICDE’2008. P. Vassiliadis. Modeling Multidimensional Databases, Cubes and Cube Operations. SSDBM’1998. L. Cabibbo and R. Torlone. A Logical Approach to Multidimensional Databases. EDBT’1998. D. Cheung, B. Zhou, B. Kao, H. Lu, T. Lam and H. Ting. Requirement-based data cube schema design. CIKM’1999. T. Niemi, J. Nummenmaa and P. Thanisch. Constructing OLAP cubes based on Queries. DOLAP’2001. O. Teste. Towards Conceptual Multidimensional Design in Decision Support Systems. DEXA’2010. A. Cuzzocrea and R. Moussa. Multidimensional Database Design via Schema Transformation: Turning TPC-H into the TPC- H*d Multidimensional Benchmark. COMAD’2013.
  • 42. 12th July 2017 BICOD'2017@London.United Kingdom 42 References (3/3) Introduction Part I: Methods & State-of-the-Art Part II: Experiences Part III: Challenging Problems Conclusion M. Fowler, Schemaless data structures. 2013 http://martinfowler.com/articles/schemaless/ N. Marz and J. Warren, Big Data: Principles and best practices of scalable realtime data systems, 1st Edition S. Agrawal, S. Chaudhuri and V. Narasayya Automated Selection of Materialized Views and Indexes for SQL Databases. VLDB’2000 http://www.research.microsoft.com/dmx/AutoAdmin K. Hose, D. Klan, M. Marx and K. Sattler. When is it Time to Rethink the Aggregate Configuration of Your OLAP Server?. VLDB’2008 Karl Schnaitter and Neoklis Polyzotis. Semi-Automatic Index Tuning: Keeping DBAs in the Loop. VLDB’2012 P. Zhao, X. Li, D. Xin and J. Han. Graph cube: on warehousing and OLAP multidimensional networks. SIGMOD’2011 L. D. Lins, J. T. Klosowski and C. E. Scheidegger: Nanocubes for Real-Time Exploration of Spatiotemporal Datasets. IEEE Trans. Vis. Comput. Graph. 2013 https://github.com/laurolins/nanocube
  • 43. 12th July 2017 BICOD'2017@London.United Kingdom 43 Thank you for your Attention Q & A Taming Size and Cardinality of OLAP Data Cubes over Big Data Alfredo Cuzzocrea, Rim Moussa and Achref Labidi 12th of July, 2017
  • 44. 12th July 2017 BICOD'2017@London.United Kingdom 44 Decision Support Systems Benchmarks TPC-DI Benchmark (1/3) [Poess et al. 2014] For benchmarking Data Integration technologies Synthetic Data of a Factious Retail Brokerage Firm »Internal Trading system data, Internal Human resources data, Internal CRM System and External data »Different data scales »Data extracted from different sources: »Structured (csv) »Semi-structured data (xml) »Multi record (nested data) »Change Data Capture (CDS) 18 Complex Data Integration Tasks Load large volumes of historical data Load incremental updates Execute complex transformations Check and ensure consistency of data
  • 45. 12th July 2017 BICOD'2017@London.United Kingdom # 45 TPC-H*d Truly OLAP variant of TPC-H benchmark TPC-H SQL workload translated into MDX (MultiDimensional eXpressions) The workload is composed of 23 MDX statements for OLAP cubes and 23 MDX statements for OLAP business queries. Each business question of TPC-H benchmark is mapped to an OLAP cube